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1 . I am the inventor on the above-referenced patent application. I am employed at 
Massachusetts Institute of Technology, Cambridge, Massachusetts 02139. I have been advised 
that Massachusetts Institute of Technology is the assignee of the entire right, title and interest of 
the subject application. 

2. I have read the United States Patent and Trademark Office Action dated August 1 8, 2004, 
the Office Action dated January 8, 2004, the Office Action dated September 26, 2002, and the art 
cited by the Examiner in the Office Actions, in particular the references Kervinen et al., 
Artherosclerosis 105: 89-95 (1994); Margaglione et al., Stroke, 29: 399-403 (February 1998) and 
Paik et al, 82: 3445-3449 (1985). I have also read the patent application and the presently 
pending claims that were rejected in the August 18, 2004 Office Action. 

3. I note that the Examiner stated in the Office Action dated August 18, 2004 that Claims 
have been rejected under 35 U.S.C. § 102(b) as being anticipated by Kervinen et al. as evidenced 
by Margaglione et al. I have read and understand the Examiner's interpretation of the Kervinen 
et al. and Margaglione et al. references. However, as discussed in this Declaration, the claimed 
invention can be distinguished from Kervinen et al. (1994) and similar published works which 
claimed to have foxmd statistically significant associations between a single mutant allele in a 
gene and risk of a particular disease or early mortality in general. 

4. Kervinen et al. (1994) and many others reviewed by Hirschhom et al. (Genet. Med: 4(2) 
45-61 (2002) attached herein as Appendix A) represent a general approach in which the 
frequency of each and any single allele is measured in two population samples and the 
frequencies are compared to discover if the absolute value of the difference is significantly 
different from 0.00 or if the ratio of the frequencies are significantly different from 1 .00. 

This approach was rooted in the widely held belief in population genetics that common 
diseases, including common mortal diseases, are encoded entirely or predominantly by specific 
single mutations in one or more genes. The examples of sickle cell anemia in Afiican 
populations and cystic fibrosis among northern Europeans serve as examples of this general 
belief I, however, regarded these two examples as exceptions to the general rule that inherited 
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diseases are encoded by multiple, tens to hundreds, of different alleles within a gene or genes, a 
scientific point of view now amply supported by data for nearly two thousand rare inherited 
human diseases (http://www.hgmd.org). In addition, recent studies have discovered a gene, 
MCIR, that encodes risk for the common diseases skin cancers by modulating the tanning 
response in Europeans and Asians for which some 65 putatively active alleles have been 
identified the summed frequencies of which total between 0.1 and 0.2 in Europeans. 

As each of multiple alleles would encode a small fraction of the risk encoded by the 
multiple alleles, impractically large populations would need to be sampled to discern a 
statistically significant difference between young and aged populations for a single allele in a 
multi-allelic set of alleles that encoded risk for a mortal disease. Furthermore, any gene carrying 
alleles coding for risk of mortal disease would, as all genes, carry multiple neutral alleles that do 
not confer risk of mortal disease. In determining whether or not a particular gene carries alleles 
that encode risk the analyst does not know a priori the actual alleles carried by the gene in the 
general population. Even were the alleles known, the analyst could not specifically identify 
precisely which alleles conferred risk. For instance some amino acid substitutions inactivate the 
function of an enzyme and some do not. 

5. I have devised a method that overcomes these difficulties and also reduces the size of 
population samples required to obtain statistically significant results. My approach has now 
been applied to test and negate the hypothesis that the gene CTLA4 carried alleles conferring 
risk for juvenile (Type I) diabetes, a widely-held belief based inappropriately on data from a 
single allele of that gene. 

My claimed method determines if any gene carries mutations (or alleles) in the general 
population that increase the risk of any common mortal disease. My method requires large 
samples of young and aged individuals from the same population, scanning gene segments 
encoding functional elements such as protein sequences and mRNA splice sites, and 
enumerating, or both enumerating and identifying, the set of all detectable mutations carried by 
any gene in both young and old populations. If a statistically significant difference in the total 
number of mutations exists between the young and aged groups or the total number of non- 
synonymous mutations or the total number of obligatory knockout mutations, i.e. the sum of stop 
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codons + frameshifts + mRNA splice site mutations, then the gene is identified as one that with a 
high degree of probability carries mutations that code for a common mortal disease. Dependent 
claims outline methods to extend such a finding to identify or significantly limit the number wof 
the many possible mortal diseases which might be caused or accelerated by the risk-conferring 
mutations carried b a particular identified gene. 

6. I combined information from two disparate fields of research, epidemiology and 
mutational spectrometry, to make this invention. 

From epidemiology, specifically the public health records of the United States, I organized 
the mortality rate data for cancers, vascular diseases and other causes of mortality from 1890 to 
1997 so that the fraction of surviving persons dying from any of the diseases could be observed 
as a fimction of age (http://epidemiology.mit.edu). From these data and a self-generated 
mathematical model of the population in which mutations in one or more genes caused or 
accelerated a mortal disease applicant derived a quantitative means to estimate the expected loss 
of disease causing/accelerating alleles in said gene or genes as the population aged. Using 
pancreatic cancer as an example, it was found that between age -50 and -100 the fraction of the 
population at fixture risk of pancreatic cancer declined fivefold. This finding suggested that given 
population samples of old and yoimg persons from the same large population, the alleles of any 
gene conferring risk for a mortal disease would decrease significantiy between age 50 and 
extreme old age. I believe that prior to this work no means existed to calculate the expected 
fractional decrease in the alleles that encode risk for any specific mortal disease as a fimction of 
age from the public mortality records of a country. 

7. From mutational spectrometry, I determined from review and organization of the existing 
literature that for nearly all known diseases, including mortal diseases, caused by inherited 
mutations in one or more genes, disease risk is encoded not by one or even two mutant alleles, 
but by many separate alleles distributed in a large population. This finding is demonstrated by 
now more than 1954 separate gene/disease relationships with an average of some 25 disease 
causing mutations per gene led applicant to teach the necessity of scanning a gene of interest for 
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a set of multiple different mutations each independently conferring disease risk. 
(http://www.hgmd.org). 

I specifically teach the necessity of scanning all of the exons and splice sites of a gene, or 
as great a portion of the gene as technically practical, using the same the analytical mode for 
analysis of both young and aged population samples. I further teach that said scanning to 
discover all detectable alleles for a gene in both populations is required because it is expected 
that individual mutations that confer risk must be individually more rare than the sum of all such 
mutations. 

8. I recognize that any gene will in general be found to carry a large number of mutants or 
alleles that do not change the molecular functionality of the gene or derived gene products. I 
teach that despite this fact, that in the case of a gene encoding a risk for a common mortal 
disease, the total number of mutations or alleles within the exons and splice sites encoding risk is 
large enough to permit recognition of a significant difference between young and aged 
populations. 

The report of Kervinen et al. (1994), Margaglione et al. (1998) and similar reports must be 
considered in light of the above discussion about the elements of the specified method to 
discover if a gene carries alleles that confer risk for a mortal disease. In particular, Kervinen et 
al. must be considered in light of Hirschhom et al., 2002 in which the entire class of studies 
represented by Kervinen et al. (1994) are found to be irreproducible and thus valueless in 
discovering genes that code for common diseases or common mortal diseases. 

9. Kervinen et al. (1994) is one of several hundreds of studies in which high fi-equency 
mutant alleles (known as single nucleotide polymorphisms or SNPs) distributed in the general 
population have been tested to discover if there is a statistically significant association as 
indicated by a decreased firequency among the aged or an increased firequency in sample cohorts 
with a particular disease relative to a sample cohort of young adults drawn fi-om the same general 
population. It is a matter of public record that the search based on SNPs for genes carrying 
alleles for common disease has failed to produce a single valid discovery. (Wall Street Journal , 
14 January 2005) 
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Kervinen et al. (1994) specifically did not scan the gene of interest for the set of all 
mutations to discover if there were a significant decrease in all alleles, in all non-synonymous 
alleles or in all obligatory knockout alleles in aged persons as in the claimed method. 

Kervinen et al. (1994) claim that their "findings strongly suggest that the presence of these 

potential genetic risk factors for CHD. (coronary heart disease) decreases the probability of an 

individual reaching an extreme old age." However, I respectfully submit that Kervinen et al. 
(1994) did not perform an appropriate statistical analysis and, like nearly all others who have 
published findings based on single allele comparisons, convinced themselves that they had 
observed significant age-specific allelic decline when they had not. 

10. The following is a description of a standard statistical means by which allelic fi-equencies 
may be compared between any two populations. This statistical statement is then applied to the 
data of Kervinen et al. I have also applied the same statistical analysis to the specific teachings 
of this application in which some of the alleles within the exons and splice sites a particular gene 
encode risk for a common disease. 

In general, the problem of comparing the frequency of alleles in a single gene in 
population A to population B is to discover if the differences in the allele frequencies are 
significantly greater than zero. 

Let the frequency of all discovered alleles for a given gene in population A be a/A where 
"a" is the number of mutant alleles in a sample containing "A" total alleles (normal + mutant 
alleles). 

Let the frequency of all discovered alleles for a given gene in population B be b/B where 
"b" is the number of mutant alleles in a sample containing "B" total alleles (normal + mutant 
alleles). 

The statistical question is whether or not 



X(a,A,b,B) = (a/A) - (b/B) > 0. 
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As a, A, b, and B may be treated as independent variables in which the values of A and B are 
defined one may straightforwardly calculate the variance of X as a function of derived variables 
in which the variances of the population sizes, A and B, defined by the experimenter are zero. 
Variance(X) = V(X) = a/A^ + b/B^ 

Standard Deviation = Variance^^ 

Standard Deviation (X)= (a/A^ +b/B^)^^ 

Now the statistical question reduces to whether or not X > 0 reduces to the question of whether 
or not 

X=[(a/A) -(b/B)] - quant (a/A^ +b/B^)*^>0 

"quant" is the multiplier derived fi:om the Normal or Poisson distributions to define the degree 
of confidence that an observation has not occurred by chance. Typically biologists use the degree 
of confidence of (1-0.05) to indicate a significant difference between two measurements ( such 
as weight of boys versus weight of girls) for which a quant = 1.65 expresses the desired 
confidence interval. However, the search for genes conferring mortal risk or causing a particular 
disease does not conform to this simple experiment. This is because there are more that 7.4 
million common alleles or SNPs in the National Human Genome Database that affect human 
mortality. This large number of SNPs in addition to the gene Apo E s4 allele examined by 
Kevinen et al. are "hidden" in experiments such as Kervinen et al. (1994) and unaccounted in 
their attempt to analyze their data. 

1 1 . This kind of statistical problem was appreciated and a solution developed by the 
statistician Bonferroni. He offered the expedient of defining the confidence interval such that 
there would be, for instance, a 0.05 probability that any of the pairwise possibilities lay outside 
the desired confidence interval by chance. This desired confidence interval he found to be 
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defined as the ratio: (desired degree of confidence) /(number of pairwise possibilities) In the 
case of trials of significance for single alleles as in Kervinen et al. the degree of confidence of 
(1- 0.05) would be represented by the confidence interval in which 0.05/(7.4 million) of the area 
under the normal distribution lie outside the interval limits 

For this (1 - 0.05) degree of confidence from a two sided normal distribution the value of 
quant is somewhat greater than 5.5 for any trial involving any single allele of an estimated 7.4 
million common alleles in the himian genome. (This value of 7.40 million is the reported number 
of variant alleles from a population size of considerably fewer than one hundred people and 
must be regarded as an underestimate of the actual number of common allelic variants each 
occurring in 1% or more of the world *s populations.) 

For this (1 - 0.05 ) degree of confidence from a two sided normal distribution the value of 
quant is about 4.0 for any trial involving counting all of the alleles in a population sample any 
single gene of an estimated 25,000 genes in the human genome, the method claimed in this 
application. 

The difference in the quant values for the two kinds of approaches, single allele 
comparisons versus single gene, total allele comparisons, has major implications for their 
practical application. The sample sizes necessary to recognize significant differences in single 
allele differences are much greater for single allele studies than the gene scanning method taught 
by applicant. 

Thus for experiments involving a single allele (or a large number of alleles including 7.4 
million alleles) the test for a significant age-specific allelic decline must satisfy the proposition 

X = [(a/A) - (b/B)] - 7.5 (a/A^ + b/B^)^^ > 0 

For experiments involving a single allele out of 7.4 million common human alleles, the test for a 
significant age-specific allelic decline must satisfy the proposition 



X = [(a/A) - (b/B)] - 5.5 (a/A^ + b/B^)^^ > 0 
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Table 3 of Kervinen et al. supplies the necessary values of a, A, b, B to test the hypothesis that 

the single e4 allele demonstrated a statistically significant decrease between the young and 

middle aged adults and nonagenarians (age >90). 

a = nimiber of e4 alleles (104) in young and middle aged 

A = number of total alleles (520) in young and middle aged 

b = number of e4 alleles (19) in nonagenarians 

B = number of total alleles (190) in nonagenarians 

X = (a/A - b/B) = 104/520- 19/190 = 0.2- 0.1 = 0.1 

Now the question is whether or not 

X = 0.1 - 5.5 ( 104/520^ + 19/190^) > 0 ? 
or 

X = 0.1 - 5.5(0.0003846+ 0.0005263) > 0 ? 
X = 0.1 - 5.5 (0.03) = 0.1- 0.165 = - 0.065 >0? 

As - 0.065 is manifestly not greater than zero the claim for statistical significance of the 
single 84 allele of the ApoE gene findings of Kervinen et al. (1994) cannot be sustained. As the 
method of Kervinen et al., used in a plethora of similar experiments reviewed by Hirschhom et 
al. 2002, does not provide a means to recognize genes carrying alleles that encode risk for a 
mortal disease, applicant respectfully argues that it should not be used to deny applicants claims 
for a distinct method that can recognize such genes. 

In contrast I have applied the same statistical tests to several plausible examples of risk 
for common mortal disease encoded by multiple alleles in the exons and splice sites of any gene 
as taught in both the original and amended claims. These calculations demonstrated to me that 
the claimed methods can determined whether or not a gene or any gene in a set of up to 25,000 
genes carries risk for a common mortal disease and that said condition of risk can be discovered 
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by the method of scanning a gene in both young and aged population samples drawn from the 
same large population for all detectable mutations, all nonsynonymous mutations or all 
obligatory gene knockout mutations. 

12. I declare that all statements made in this Declaration of my own knowledge are true and 
that all statements made on information and belief are believed to be true. Moreover, these 
statements are made with the knowledge that willful false statements and the like made by me 
are pimishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United 
States Code and that such willful false statements may jeopardize the validity of the application 
or any patent issued thereon. 




William G. Thilly, Sc.D. 



Date: 1 8 February 2005 
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A comprehensive review of genetic association studies 

Joel N. Hirschhorn, MD, FhD^~'^ , Kirk Lohmueller^ , Edward Byrne\ and Kurt Hirschhorn, MD^ 

Most common diseases are complex genetic traits, with multiple genetic and environmental components contrib- 
uting to susceptibility. It has been proposed that common genetic variants, including single nucleotide polymor- 
phisms (SNPs), influence susceptibility to common disease. This proposal has begun to be tested, in numerous 
studies of association between genetic variation at these common DMA polymorphisms and variation in disease 
susceptibility. We have performed an extensive review of such association studies. We find that over 600 positive 
associations between common gene variants and disease have been reported; these associations, if con'ect, 
would have tremendous importance for the prevention, prediction, and treatment of most common diseases. 
However,: most reported associations are not robust; of the 166 putative associations which have. been studied 
three or more times, only. 6 have been consistently replicated. Interestingly, of the remaining 160 associations, 
v;ell over half were obsen^ed again one or more times. We discuss the possible reasons for this irreproducibility 
and suggest guidelines for perfomning and interpreting genetic association studies. In particular, we emphasize the 
need for caution in drawing conclusions from a single report of an association between a . genetic variant and 
disease. susceptibility. Genet Med 2002:4(2):45-61. 

Key Words: human genetics, association studies, common disease, polymorphisms 



For most common diseases, including heart disease, diabe- 
tes, hypertension, and cancer, multiple genetic and environ- 
mental factors influence an individuaFs risk of being affected. 
This complexity contrasts with the inheritance pattern of mo- 
nogenic disorders, in which the presence or absence of disease 
alleles usually completely predicts the presence or absence of 
disea.se (although the severity or age of onset may vary). For 
genetically complex diseases, risk alleles are less deterministic 
and more probabilistic — the presence of a high-risk allele may 
only mildly increase the chance of disease. Furthermore, it has 
been proposed that these weakly penetrant alleles may be 
present at high frequency (>1%) in the population.'--^ 

The widespread presence of high frequency variants in hu- 
mans was first shovm experimentally by Harris among others,* 
who found that many proteins have several common, heritable 
iso forms, thereby demonstrating that common genetic varia- 
tion could lead to variation in protein struaure. The wide- 
spread presence of such variation suggested that common vari- ' 
ants might be biologically important. As Harris* hypothesized 
•in 197} (see p. 272), "The other group of alleles, though nu- 
merically much fewer, are individually much more common. 

Vroni ' Whiiithetiti luniUitc/MlT Cm^r fur (icmmie H(vearck (AinibrUistf; 'Divisiom of 
CtCKr^ticsandEiniocmtoiosy, Chiidrefj's Hospiuil. Dostan; ^DvpnriniciUvjGenefic^ Harmn! 

ScUooL Hofton, Mmsnchitsrtts; an{\ ''Ikparnnctm of PcdUinks ai^d Huuuvt (JcKet- 
{icf. Mown Sinai IschonJ p/.Vfix/rcifir. New. Yot^,. Ncv^ York. 

^SuppUtncUaiyinfonrMtiofi ifuUcitaiiomfpr reftrrauxi 4.^-6iij*amlSup;iiettmi:ttry T*iMf J) 
i -nai lte found tU ^^w\v.gtUIcTi^^itl!lt(d}chlcoJs:, 

./orJ-iV: Hinchhcrn, \\%tthcafi /lunii/fF/AHT Center far Genome Research Onn KaM 
:i&iunte, Buihhug 300, (Mmbridsc, MA 02139. 
''ftecHvcJ:SfpJe!7tb€r?A'^O0l. 
yi^pieitDcfemijcr 17, 2001: 



They [common DNA variants] provide the basis for the great 
variety of enzyme . . . polymorphisms which evidently occur. 
These are quite possibly the underlying biochemical cause of 
much of the inherited diversity in the physical and physiolog- 
ical characteristics of individuals, and also in relative suscepti- 
bilities to various diseases and other disorders." Unfortunately, 
tests of this hypothesis were limited to proteins for which com- 
mon functional variation could be easily assayed (primarily a 
few enzymes and determinants of blood group antigens). 

The advent of gene cloning and sequencijig substantially 
lowered tins technical hurdle. It became possible to easily de- 
tect DNA variants in a given gene. The first genetic variants 
tested were usually restriction fragment length polymorphisnis 
(RFLPs), but with the development of the polymerase chain 
reaction (PGR) and other improvements in technology, mic- 
rosatellites, variable number tandem repeats (VNTRs), inser- 
tion/deletion polymorphisms, and single nucleotide polymor- 
phisms (SNPs) could all be analyzed. 

By determining the genotype of these variants in individuals 
with disease and in unaffected controls, these polymorphisms 
could be tested.for association with susceptibility to a variety of 
diseases. Such studies, called "association studies," have usu- 
ally used a case-control design (although family-based designs 
have also been used; see below). In this design, the frequencies 
of the alleles or genotypes at the site of interest are compared in 
populations of cases and controls; a higher frequency in cases is 
taken as.evidehce that .the allele or genotype is associated with 
increased risk of diseased The usual coiidusion of sucH studies 
is that the polyrnorphism being tested.eitheir affects risk of 
disease directly or is a marker for some nearby-genetic variant 
that alTects risk of disease. ^^^^^hIh^^hm^^ . 
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These association studies were further facilitated by the in- 
creasingly rapid discovery of common polymorphisms in 
genes, accomplished by resequencing the same stretch of DN A 
in multiple individuals. One of the goals of the human genome 
project has been to identify large numbers of SNPs; indeed, the 
number of SNPs in public databases is now well over 
1,000.000.5 we describe below, association studies have al- 
ready identified over 600 potential associations between com- 
mon genetic variants and susceptibility: to common disease. As 
the availability of kno\vn polymorphisins skyrockets, so too 
will the number of reported asspciatipns. It is, therefore, criti- 
cal to have a frame work in place by which one can evaluate and 
interpret these associations. 

The purpose of this publication is to list and put into per- 
spective many of the exaihples -of associations in the recent 
literature, thereby providing an interim: picture of this exciting 
and rapidly developing field In; addirira^^ in 
detail two illustrative exami)les: (1) the association between 
deep venous thrombosis and Factor V Leiden, a common poly- 
morphism in the gene encoding clotting factor V. and (2) the 
association between various diseases and a common polymor- 
phism in MTHFR. the gene encoding methylene tetrahydrofo- 
late reductase. Finally. we vviU suggest some guideUnes for the 
analysis of association studies, because proper evaluation of 
these associations is critical both to understanding the genetics 
of common disease and to informing recent disaissions re- 
garding screening for common genetic disease. 

IMTERIAL5 AND METHODS 

We performed two independent reviews of the . literature 
from 1986 through 2000 to identify published significant asso- 
ciations beuveen common diseases or dichotomous traits and 
common polymorphisms in or near genes (sites of genetic vari- 
ation in which the minor aUele frequency is at least 1%). We 
excluded monogenic disorders, because linkage analysis and 
positional cloning methods have been highly succes.sfiil in 
. identifying the alleles responsible for these diseases. Because of 
the large amount of prior literature, we also did not consider 
polymorphisms in HLA of blood group antigens, even though 
there are many robust associations between variation at these 
loci and disease. For simplicity, we have only included associ- 
ations between variation at a single locus and susceptibility to 
disease iri the entire populatioti under study in the publication. 
In particular, we have not included associations between pairs 
of loci and susceptibility to disease nor associations between a 
polymorphism and susceptibiHty to disease in a subgroup of 
patients (such as smokers or those receiving hormone replace- 
ment therapy). Thereby we have explicitly ignored reports of 
gene-gene and gene- environment intieractions, even though 
some of these interactions may well be of great biologic and 
clinical interest; Finally, we have not listed associations with 
substance abuse ( where phenot>'pe definition is often murk>0, 
associations between polymorphisms and variation m labora- 
tory findings (such as serum calcium levels),, or associations 
with other quantitative, continuous traits (as opposed to di- 
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chotomous traits). Associations were considered significant if 
the nominal P value was < 0.05 or if the 95% confidence in- 
tervals for relative risk excluded 1 .00. 

REVIEW OF THE ASSOCIATION STUDY LITERATURE 

We identified 268 genes that contain polymorphisms re- 
ported to be associated with I of 133 common diseases or di- 
chotomous traits. In total, these 268 genes accounted for 603 
different gene-disease associations. These associations are 
listed in Table 1 , grouped according to die trait or disease un- 
der study. As seen in Figure I , the number of new genes asso- 
ciated with diseases or traits has risen more or less steadily from 
1 993 to 2000. The temporary drop-off in 1999 and early 2000 
likeiyrefleas an emphasis on testing newly identified polymor- 
phisms in previously studied genes (data not shown). Exami- 
nation of table 1 also shows .that many genes have been asso- 
ciated with several different diseases; for example, 
polymorphisms in TNF, the gene encoding tumor necrosis fac 

tor alpha, have been associated whh 20 different diseases or 
traits, whereas variants in ACE (encoding angiotensin convert- 
ing enzyme), VDjR (encoding the vitamin D receptor), and 
MTHFR (encoding methylene tetrahydrofolate reductase) 
have each been associated with over a dozen different diseases 
or traits (see also supplementary Table 1). As iUustrativc exam- 
ples, we examine in more detail two of the associations in Table 
1: the association of F5 (clotting factor V) and deep venous 
thrombosis, and the association between MTHFR and a variety 
of diseases. 

The original report of an association between F5 and deep 
venous thrombosis grew out of observations that resistance to 
activated protein C, a biochemicaUy defined phenotype. was 
associated with markedly increased risk of deep venous throm- 
bosis.^ In an elegant study, the molecular basis of activated 
protein C resistance was shown to be a single nucleotide poly- 
morphism in F5 encoding an arginine to glutamine change in 
codon 506 (Factor V Leiden; see Bertina et al.^). This change 
occurs at one of the protein C cleavage sites, thereby preveiiting 
inactivation of fector V by activated protein C and leading to a 
hypercoagulable state.** Subsequent studies of this polymor- 
phism have repeatedly demonstrated association with suscep- 
tibility to deep venous thrombosis, with P values often at or 
below 10"^ in mdividual studies (for example, Salomon et 
al.9). These studies were performed in several different popu- 
lations, although the range of populations available for study is 
limited by the fact that Factor V Leiden is uncommon in non- 
Caucasian populations.^" Thus tltis association is extremely ro- 
bust in addition to having high biologic plausibility. 

By contrast, associations involving coinmoii variation in 
MTHFR have not been as . reproducible. = A cpmrhpn thermo- 
labile variant of methylerieMrahyda)fe^^ was first 

described in 1991. ThermoiabiUty of enzym inher- 
ited as a recessive trait" and ^^s eVentuatty due to 
homoTygosity for the *:r allde.a^^^^ 

cleotide 677 (causing an klaiiinf to yaline ^ Frosst et 

al.i2j,.y alike the rare, jnorO-eyere^',n^^^ 
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Table 1 

Associations between common polymorphisms in genes and common diseases or dichotomous traits 



Disease/trait 


Gene (rcf) 


Gene (reO 


Gene (ref) 


Gene (ref) 


— 

Cancer 










Acute leukemia 


LriPlAl (45) 


CYP2D6 (45, 46) 


GSTMl (45) 


GSTTl (45) 




MTHFR (20) 


NAT2 (47) 






Bladder cancer 


G5TM1 (48) 


OSTPl (49) 


GSTTl (50) 




Breast cancer 


COMT (51) 


CYP17 (52) 


CrV^P19 (53) 


CYPlAl (54) 




CYPlBl (55, 56) 


£RfiB2 (57) 


. ESRl (58) 


GSTMl (59) 




HkAS (60) 


HSPA8(61) 


NATl (62) 


NAT2 (63) 




PGR (64) 


SHBG (65) 


SOD2 (66) 


TP53 (67) 




VDR (68) 








CervicaLi cancer 


GSTTl (69) 


W A'1"T TT^n t^i\\ 

MTHFR (70) 


TP53 (71) 




CLL 


ETSl (72) 


TNF(73) 






Colorectal cancer 


ALU H2 (74) 


APC (75) 


CiPlAl (76) 


pIA4 (77) . 




GSTMl (/8) 


GSTTl (79) 


LTA (80) . .. 


. MSH3 (81) 




MTHFR (18) 


NATl (82) 


- NAT2(83) 


XRCCl (84) 


EndometrtaJ cancer 


CDKNIA (85) 


CYPlAl (86) 


MMPl (87) 


MTHFR. (86) . 




TP53(88) 








GsLStric cancer 


ALDH2 (74) 


GSTMl (89) 


GSTTl (90) 


ILIB (91) 




MiC (92) 








Glioblastoma 


PPARG (93) 








Head/neck cancer 


ADnlB (94) 


ALDH2 (94) 


CDKNIA (95) 


CYPlAl (96) 




ClP2p6 (97) 


CYP2E(98) 


x*^^T\^ A ir\A\\ 

. . rCGR3A (99) 


GSTM 1 ( 100) 




Gc>lM3 (101) 


GSlrl (102) 


G51T1 (101) 


LTA (103; 




MYCLl (104) 


NATl (48) 


NAT2 (102, 105) 


TP53 (106) 


Hodgkiu*$ lymphoma 


HSPA8.(6l) 


TNF(61) 




Uvcr cancer 


. CyP2b6 (107) 


CY?2E(108) 


BPHXl (109) 




Lung cancer 


ALDH2.(74) 


CDKNIA (UO) 


CYPlAl (111) 


CYPlBl (55) 


CYP2A6 (112) 


CYP2E (113) 


T^f k*A it 1 a\ 

D1A4(114) 


f^'t AA/l4^V 

DIA4 (115) 




EPHXl (116) 


GPX1(117) 


GSTMl (118, 119) 


HRAS (120) 




LTA (121) 


MGMT(122) 


MPO(123) 


NATl (124. 125) 




NAT2 (126) 


TF (127) 


TP53 (128) 




Melanoma 


TTT>A^ t-\^W\\ 

HRAS (129) 


MCiR{130) 


XRCC3 (131) 




Non-Hodgkin's lymphoma 


EPHXl 


ETSl (132) 


PGR 




Oral leukoplakia 


GSTMl (133, 134) 


GSTTl (133, 134) 






Oligoastrocytom a 


EKCCl (99) 








Ovarian cancer 


HRAS (135) 


TP53 (136) 






Prostate cancer 


AR (137, 138) 


CyPI7.(139, 140) 


/"MTTll At / t A 1 \ 

CYPlAl (141) 


CYPlBl (142) 




(JYP3A4 (143) 


ELAC2 (144) 


GSTPI (49) 


ci>t*\caS /1AC\ 

SRD5A2 (145) 




VDR ( 146) 








Renal cell cancer 


CYPIA 1 (147) 


/-tOTTl /lAO\ 

GSTTl (148) 






Testicular cancer 


GSTPl (49) 








Cardiovascular disease 










CAD/MI 


ACE (149) 


A TVT%T)4 /l Pn\ 

ADRB3 (150) 


AGTRl (151) 


A At / ^ ^ ^ V 

APOAl (152) 




APOB (153) 


APOE (154) 


.. CD14 (155) 


CYBA (156) 




F13A1 (157) 


F2 (158) 


F5 (159) 


F7 (160) 




FGB (161) 


GPlBA (162) 


GSTMl (163) 


HTR2A (164) 




FRSl (165) . 


1TGA2(166) 


rrGB3 (167) 


LPL (168) 




MMP3 (169) 


MTHFR (13, 14) 


N0S3 (170, 171) 


NPPA (172) 




PLAT (173) 


PUNl (174) \ 


PUN 2 (175) 


rr AKvj ( 1 76) 




SELE(177) 


SELP(178) 


SERPINA8 (179, 180) 


SERPINEI (181) 




TGFBl (182) 


THBD (183) 


WRN (184) 




DVT 


F13A1 (185) 


F2 (186) 


F3(i87) 


F5(7) ■ 




MTHFR (19) 


PLAT ( ! 88) 


PONl (189) 




Dilated cardiomyopathy 


AC;E(190) 


EDNRA (191) 


PLA2G7.(170) 


S0D2 (192) 


HTN 


ACE (193) 


ADDl(194) 


AGTRl (195) 


CYP11B2(196) 




DIA4 (197, 198) 


DRDI (199) 


GCK (200) 


. GNAS1(201) 




GNB3 (202) 


GYSl (203) 


HSD 1182 (204) 


. .INSR(205) 




MTHFR (206) 


NPPA(172) 


. REN (207) 


SAH (208) 




SCNN1B(209) 


SERP1NA8(210) 


TGFBi (182) 


TH(211) 


Survival post-CHF 


ADRB2 (212) 


AMPD1(213) 






-Dermatology 










Acne 


MUCl (214) 








Contact dermatitis 


. NAT2 (215) 








- Eczema 


. (:MA1 (216). 








. / Psoriasis 


C4A(217) . .. 


CX>SN(218) • ■ 


.. LTA (219) . 


. ■ OTF3(220) 




. ■ ■SERPINA8(2i9) 


TAP1.{221} • 


. ..TNF:(222,223) .: 


.; .VDR (224)^' : 
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Table 1 

(Continued) 



Disease/trait 



Gene (ref) 



Gene (rcf) 



Gene (ref) 



Gene (reO 



Endocrinology 
Addison's disease 
Gestational DM 
Graves' disease 

Hyperparathyroidism 
Male infertility 
Obesity 



Osteoporosis/fracture 
PCOS 

Short stature 
Type 1 diabetes 



Type 2 diabetes 



Gastroenterology 
Celiac disease 
Cholelithiasis 
IBD 



Pancreatitis 

Primary biliary cirrhosis 
Infectiousdisea.se 
Cerebral malaria 
HTV.infcction/AIDS 

Uishmaniasis 
Leprosy 

Meningococcal disease 
Parasitic infections 
RSV bronchiolitis 
Severe sepsis 
Trachoma 
Tuberculosis 
Viral hepatitis 
Misc«Qaneous ' 
Athletic endurance 
Benzene toxicity 
Fair skin, red hair 
High altitude HTN 
Lead poisoning 
Longevity 

Macular degeneration 
Tobacco use 

Trichloroethylcne toxicity 
Neonatal disease 
Qefllip/palate 

Nciiral tube defect 
Pyloric stenosis 
RDS - 



CTLA4 (225) 

INSR (226) 

CTLA4 (227) 

THRB(231) 

yDR(234) 

AR(235) 

ABCC8 (237) 

APOD (241) 

UPE(245) 

TNF (249) 

CbLlAl(250) 

CYPUA(253) 
. INS (257) 

DRb2(259) 

ECU (262) 

CD4 (265) 

IFNG (270) 

LTA(274) 
. WFS1 (278) 

ABCC8(279) 

FRbA(284) 

HFE(290) 

IRSl (295) 

PPP1R3 (298) 

TCP! (302) 

CTLA4 (304) 
. APOB(306) 
BDKRBV(308) 
MLHl(312) 
VDR (316) 
ILlRN(3l7) 
CTLA4 (318) 

CD36 (320) 
CCR2 (324) 
SDFl (329) 
TNF (331) 
TNF (332) 
FCGR2A (334) 
ADRB2 (337) 
IL8 (339) 
ILIRN(340) 
ItlO (341) 
SLC11A1.(343) 
MBL2 (344) 

ACE (346) 
D1A4 (347) 
MC1R(348) 
ACE (349) 
ALAD(350) 
ACE (351) 
SERPINEl (355) 
APOE (356) 
DRD2 (358) 
GSTMl (360) 

BCL3 (361) 
TGFB2 (365) 
-MTHFR(16,17) 
NOSl (368) 
SFTPAl (369, 370) 



IFNG (228) 
TRHR (232) 

LHB (236) 
Ar)RB2 (238) 
GNB3 (242) 
NMB (246) 

TGFBl (251) 
CYPi7 (254) 
LHB (258) 
VDR(260» 261) 
C4A (263) 
CrLA4 (266) 
1GHV2-5(271) 
NEURODl (275) 

ACE (280) 
GCGR (285, 286) 
INS (291) 
KCNJll (296) 
RRAD (299) 
UCH3 (303) 

TNF (305) 
CETP (307) 
F5 (309)- 
MTHFR(313) 



VDR (319) 

ICAMl (321) 
CCR5 (325, 326) 
SLC11A1 (330) 

VDR (333) 
SERPINEl (335) 
N0S2A (338) 



TNF (342) 
TNF (345) 



APOAl (352) 

EPHXl (357) 
SLC6A3(359) 
GSTTl (360) 

MSXl (362) 
TGFB3 (362) 
MTR (366) 



IL4 (229) 
VDR (233) 



ADRB3 (239) 
LDLR (243) 
NPY5R (247) 

VDR (252) 
FSHB (255) 



CCR2 (264) 
. GCK (267) 
IL6 (272) 
PSMB8(276) 

ADRB2 (281,282) 
GCK (287, 288) 
INSR (292, 293) 
PCSK2 (297) 
SLC2Al(300) 



IL10(310) 
MUC3A (314) 



NOS2A(322) 
CX3CR1 (327) 



TNF (336) 



APOB (353) 
SOD2 (357) 

RARA (363) 
T (367) 



TAPl (230) 



APOB(240) 
LEP (244) 
PPARG (248) 



FST(2S6) 



CD3D (265) 
ICAMI (268i 269) 
INS (273) 
VDR (277) 

CD4 (283) 
GYSl (289) 
IPFl (294) 
PPARG (37) . 
SLC2A2(301) 



ILlRN(3n) 
TNF (315) 



TNF (323) 
MBL2 (328) 



APOE (354) 



TGFA(364). 
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biseasc/trait 


Gene (reQ 


Gene (ref) 


Gene (ref) 


Gene (ref) 


Neurology 










Absence seizures 


UAoKo3 (371) 


UrRMl (372) 


bLC6A3 (373) 




AizheiiheT*s disease 


A'^Vif /a*?jt 5TC\ 

A2M (374, 375) 


ACE (376) 


ADD Of /■aT^\ 

APddI (377) 


APOA4 (378) 




ArOCI (379). 


APOC2 (380) 


A HOC /io 1 \ 
APOE (381) 


BCnc (382) 




BLMH (383) 


IT 1 A t^O£\ 

ILIA (386) ' 


CTTSu (384) 


nTR6.(385) 




LRPI (387) 


NOS3 (388) 


•PSENl (389) 


SERPINA3 (390) 




bLCQA4(391) 


Tr (392) 


TFCP2 (393) 


TGFBl (394) 




TNFKrvSo (395) 


VLDLR (396) 






Creutzf el dt- Jakob disease 


rKNr (397) 








Epilepsy 


'CHRNA4 (398) 








GuiUian-barrfe syndrome 


TNF (399) 








.Head injury outcome 


APOE (400) 








■ Hydrocephalus 


APOE(40l). 


T*V1^* / At\^\. 

ENG (403) 


MMP9 (404) 




Intracranial aneurysms 


ACE (402) 


ENG (408) . 


Ischemic stroke 


ACE (405) 


APOE (406) . 


. CJYBA (407): 




rl3Al (409) 


F2 (410) 


rGBv4il) 


: OrlBA (162) 




nC»A2 (412) 


MinrK (413, 414)- 


NOc»3(4l3) 


NrrA(4JO) 




PLA2G7 (417) 


PONJ (418)- 






■ Migraine headache 


DBH (419) 


MTHFR (420) . . 


SLC6A4(421) . - 




Multiple sclerosis 


CTLA4 (422) 


Til n'vt f A •! * \ 

ILIKN (423) 


MBL2 (424) 


FTPRC (425) 


Myasthenia gravis 


FGGR2A (426) 


ILIB (427) 


TNF (428) 




. Otosclerosis 


COLiAl (429) 








• Parkinson's disease 


/VZJVl (4jUi 




CO^C (A\'i\ 






t^iriAi (4j4) 






nnn7 (a\7\ 




CDUVI (A\9\ 

JtrnAl yxyo) 




\AAC\h. (AA^ 










KirSC^ IAAA\ 


3Ct\irliN/V3 ^1^3^ 






CT r'AA^ fAAft,\ 


*i? CfikA iAAT\ 






UCHLi (449) 








Obstetric disease 










Endometriosis 


ESRi (450) 








Fetal loss 


ACri (451 ) 


A A /AIZ.'i\ 

1 i^M (43^) 


powvi * 

trrlAl H33J 


C-) M^A1 




r3 (430) 


jvj mrK (430/ 






Preeclampsia 






PS {A'^'^\ 


T PL (ASQ) 


jvi inriv ^401 J 






TNF f464^ 


• 

Pharmacogenetics 










Albuterol response 










Antidepressant response 










Aspirin resp>onse 


li(jrt>3 (4o7) 








Azathioprine toxicity 


IrMl (40o) 








Beta-blocker response 


GNAol (201) 








Clozapine response 


UKU3 (40?) 






n 1 lUA v^/ 1 / 




n 1 iw \4/3> 


TMP /474^ 




Drug- induced tardive dyskinesia 


CyP2D6 (475, 476) 


DRD2 (477) 


DRD3 (478) 


HTR2C (479) 


S0D2 (480) 








Fluvastalin response 


APOB(481) 








Fluvoxamine response 


SLC6A4 (482) 








Trinotecan toxicity 


UGTlAl (483) 








Lcukotriene Inhibitor response 


ALOX5 (484) 








Lithium response 


IMPAl (485) 








Menadione-associated urolithiasis 


DIA4 (486) 








Omeprazole response 


CyP2C19 (487,488) 








Pravastatin response 


CETP (489) 


MMP3 (490) 






Tacrine response 


APOE (491) 








: Tricylic antidepressant response 


CYP2D6 (492) 








Warfarin respoxise 


CYP2C9 (493) 









-Continued 



which cause homocystinuria, the variant was not associated 
With neurologic deficits. However, thermolability of enzyme 
vactivity was observed to be. associated with altered homocys- 
r.teihje levels and risk of coronary artery disease,^ ^ findings that 
■^yere confirmed in at least one subsequent. study that looked at 
. nudeotide 677 (see Gallagher et al. and Kluijtmains et alJ^»^^). 
Pblate metabolism and homocysteine leveliare connected with 
x Sevcra] clinical disorders, induding coronary artery disease,: 



deep venous thrombosis, neural tube defects, and cancer (see 
GaUey and Gregoiy'^ for review); the thermolabile variant has 
been associated in different stuciies with increased risk of each ■ 
of these diseases. ^^^^^'^^-^^ However, despite the biologic plau- 
sibiUty of these associations, none have been ireproducibly pb- 
SCTved across many studies (for exaniple. Ma et aL^ * . 

If all of the associationslisted in Table 1 could be replicated 
as consistendy as factor V Leiden and deep venous thrombosisi 
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Disease/trait 



Gene(r€f) 



Gene (ref) 



Gene (rcf) 



Gene (reO 



Psychiatry 
Anorexia 
ADHD 

Autism 

Bipolar disorder 



Compulsive gambling 
Depression 

OCD 

Panic disorder 
Schizophrenia 



Pulmonary disease 
Asthma/atopy 



COPD/emphysema 

Pneumoconiosis 
Pulmonary fibrosis 
Pulmonary embolism 
Sarcoidosis 

Renal/uroiogic disease 
IgA nephropathy 
Nephrotic syndrome 
Renal failure 

Urolitliiasis 
Rheumatology 
Behcet*s disease 
Intervertebral disc disease 
Juvenile chronic arthritis 
JRA 

Osteoarthritis 
Rheumatoid arthritis 

Sjogren's syndrome 
SLE 



Wegener*s granulomatosis 



HTR2A{494) 

COMT (495) 

HTR2A(499) 

ADA (501) 

APOE (504) 

DRD3 (508) 

MAOA (512) 

SERP1NA8 (516) 

DRb2(519) 

ACE (521) 

GNB3.(466) 

DRD4 (527) 
. ADORA2A(53l) 

APOE (533) 

DRD2(537) 

GNAt (541) 

HTR5A(5iO) 
. OPRSl (548) 

ACE (552) 
GSTPl (556) 
IL4(5iS0) 
MS4A1 (564) 
SCYA5 (568) 
TBXA2R(572) 
. CFTR (574) . 
. SERPINAl (578) 
TNF (581) 
TGFBl (582) 
fGA.(583) 
ACE (584) 
VDR(588) 

TRA@ (589) 
SERPINAl (590) 
BDKRBl (591) 
N0S3 (595) 
DIA4 (486) 

ICAMI (596) 
COL9A2 (597) 
IL6 (598) 
SLCllAl (600) 
COL2A1(601) 
CRH(603,604) 
SLCUAl (608) 
GSTM 1(613) 
ACE (614) 
C4B (616) 
HSPA2 (620) 
TNF (624) 
CTLA4 (626) 



DRD4 (496) 
SNAP25 (500) 
EN2 (502) 
ATP I A3 (505) 
GABRA5 (509) 
MAOB (513) 
SLC6A4 (517) 
URD4 (520) 
COMT (522) . 
HTR5A (510) 
HTR1B(528) 
CCK (532) 
CCK (534) 
DRD3 (538) 
HMBS(542) 
HTR6(545) 
PLA2G4A (549) 

ADRB2 (553) 
HNMT(557) 
IUR(56i) 
NOSl (565) 
SERPINA8 (569) 
TNF (563) 
EPHXl (575) 
SERPINA3 (579) 



CCR2 (585) 



DCPl (592) 
SERPINA8 (592) 



TAP2 (599) 

.VDR(602) 
ESRl(605) 
TAP2 (609) 

ADPRT(615) 
CTLA4 (617) 
1GHV3-30-5 (621) 
VDR (625) 
PRTN3 (627) 



DRD5 (497) 

FMRl (503) 
COMT (506) 
HTR5A(5I0) 
PLA2G1B(514) 
TPH(518) 

DRD3 (523) 
SLC6A4 (525) 
HTR2A (529) 

CCKBR (535) 
DRD4 (539) 
HRH2 (543) 
. KCNN3 (546) 
PLA2G7 (550) 

CCR5 (554) 
ILIO (558) 
IL9R{562) 
NOS3 (566) 
TAPl (570) 
UGB(573) 
GC(576) 
TNF (580) 



CCR5 (586) 



HSD11B2 (593) 



HSPAIA (606) 
TRD@ (610) 

BCL2 (262) 
CYP2D6 (618) 
ILIO (622) 



SLC6A3 (498) 



DDC (507) 
HTR6(5ll) 
PLCGl (515) 



DRD4 (524) 
TPH (526) 
SLaA4 (530) 

COMT (536) 
DRD5 (540) 
HTR2A(544) 
NTF3 (547) 
YWHAH (551) 

CFTR (555) 
IL13(559) 
LTA (563) 
PU2G7 (567) 
TAP2 (571) 

GSTPl (577) 



SLCllAl (587) 



KLKBl (594) 



IFNG (607) 
XRCC3 (61 U 612) 

C4A (427) 
FCGR2A (619) 
MBL2 (623) 



DVT, deep vein thrombosis; IgA. immunoglobulin A. 



this list would represent a significant understanding of the eti- 
ologies of most of the major human diseases. However, genetic 
associations more often behave like those seen with MTHFR: 
they are not consistently reproducible. To determine what 
fraction of the associations in Table 1 were robiist. we first 
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identified those associations for which an assessment of repro- 
ducibility could be made. These 166 associations (those for 
which we couldfind and reviewat least three separate publica- 
tions) are listed in Table 2. Where more than one polymor- 
phism.in a gene was studied, the polymorphisms were treated 
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ftg. 1 The number of new, previously unrcptirred, significant assodattoiu between 
idiseases or dichtuomous traiuand genes is plotted for each year from 1981 through 2000. 
The graph does not include new associations between a disease or (rait and polymor- 
i>hisnis in a gene for which utlier polymorphisms had previous!)* been significantly asso- 
ciated with that disease or trait. 



. separately. Although a significant effort was made to be com- 
plete, there are undoubtedly some well-studied associations 
that are not listed in Table 2. Nevertheless, we believe that this 
list is a reasonably accurate representation of the state of pub- 
lished association studies between polymorphisms and com- 
mon genetic disease. 

We reviewed the 166 associations in Table 2 to determine 
whether other studies of the same polymorphism and disease 
also reached statistical significance. Only six associations were 
reproduced at a high level of consistency (statistical signifi- 
cance was achieved in 75% or more of all identified studies). 
These six associations are listed in Table 3. The possibility of 
publication bias and consequent omission of "negative** stud- 
ies means that six is actually an upper limit for the number of 
consistently reproducible associations. Of the associations in 
Table 3, the most reproducible was the association of Apo£4 
and Alzheimer's disease, for which dozens of reports reach 
statistical significance. It should be noted, however, that the 
association is most robust in Caucasians (all identified reports 
achieved statistical significance); for other ethnic groups (Af- 
ricans, African^ Americans, and Hispanics), the association is 
sometimes more difficult to demonstrate.^*-^^ 

What could be the cause of the irreproducibility that char- 
acterizes the vast majority of association studies? One possibil- 
ity is that the original observations represent statistical fluctu- 
ations (type I error). If this were the case, one would predict 
that only 5% of subsequent studies would also reach statistical 
significance with P < 0.05, and most associations would never 
be observed again. However, of the 166 associations listed in 
Table 2, at least 97 were observed again, many of them multiple 
times. Thus in the absence of a massive publication bias (selec- 
tive publication of positive results with numerous negative 
studies remaining unpublished), statistical fluctuatioii is uh- 

j likely to explain all of the initial . positive reports in.Table 2. ' 
Other possible causes of false-positive association studies' 

: have Wen previously identified and include ethnic admixture 

; resultiiig in population stratification, variable linkage disequi- 
:libriura between the polymorphism being studied and the true 



causal variant, and population-specific gene-gene or gene- 
environment interactions."-^ Each of these issues is addressed 
briefly in turn below, and possible remedies are offered. Fi- 
nally, we examine the possibility that weak genetic effects com- 
bined with underpowered studies lead to significant numbers 
of falsely negative reports. 

POPULATION STRATIHCATION 

Most association studies have a case-control study design, in 
which allele or genotype frequencies in patients are compared 
with frequencies in an unaffected control population (Fig. 2a). 
This study design is subject to population stratification due to 
ethnic admixture, which dccurs when the cases and controls are 
unintentionally drawn from two or more ethnic groups or sub- 
groups. If one ojf these subgroups has a higher disease prevalence 
thiih the otliers, stratification occurs, because that subgroup will 
be overrepresented in the cases and undenepresented Lh the con- 
trols. Any polymorpliisra that genetically marks the high-risk sub- 
group (Le., is found by chance at a higher ft:equency in that sub- 
group), therefore, will appear to be associated with disease (Fig. 
2b) and will likely be a false positive. Interestingly, the frequencies 
of se\^eral of the alleles in Table 2 vary substantially between pop- 
ulations, consistent with the possibility of false associations due to 
ethnic adniixture. It should be noted that well-defined subgroups 
are not necessary to observe stratification; stratification can also 
occur in a single admixed population where the individuals have 
varying degrees of genetic contributions from two or more ethnic 
groups. Even apparendy homogeneous, isolated populations 
(such as Iceland) are in theory susceptible to admuctiu'e if there 
have been multiple distinct waves of migration from different 
source populations (e.g., Celtic and Norse, in the case of Iceland). 

What steps can be taken to prevent false-positive associa- 
tions due to population stratification? Currently, two solutions 
can be attempted. First, one can use family-based studies such 
as the transmission disequilibrium test.^' This method, abbre- 
viated TDT, requires affected offspring and their parents to test 
an allele for avssociation with disease; the frequency with which 
heterozygous patents transmit that allele to offspring is then 
determined. This frequency is compared with the Mendelian 
expectation of 50:50 transmission of the allele. TDT (like other 
family-based methods) is immune to false-positives from eth- 
nic adihbclure.?' Disadvantages of the TDT are that family- 
based samples are often difficult to collect and that 50% more 
genotyping is required than in case-control studies to achieve 
similar power (the exuct loss of power depends on the under- 
lying genetic model). Another possibility is to study multiple 
case-control populations, each from different ethnic groupis, 
and require ihatt an association, be seen in each, population. 
Final |y, an approaeh to detect and correct for strati fication has 
been proposed: by typing several dozen random niarkers, one 
can enipiricaUy determine the degr:ee of stratification in a case 
: control study,^2-^^* If sigmficant stratification is detected, one 
can use these markers to more carefully match cases and con- 
trols to remove the effects of stratification.^^. There is some 
debate as to whether stratification is a significant problem; 
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Table 2 

Disease-polymorphism associatious for which at least three studies were identified 

Disease/trait Gene PoJ>TnoTphisin Risk alicle/genntype Frequency Reference 

Cancer 



Bladder cancer 


GSTMl 


nuB (gene deletion) 


ttuU/nulJ 




U.4O-0.0U 




Bladder cancer 


GSTTl 


null (gene ddetlon) 


null/null 




A 1 e 
U.I 9 




Bladder cancer 


NAT2 


857G/A = BamHI 


A = M3 = slow aceiylator 




0.06 




Breast cancer 


CV'P17 


-34T/C=MspAI 


T/CandaC 




0.55 




Breast cancer 


CYPlAl 


3 CAT (Mspi) 


site present/site present = 


ac 


0.04 


54 


Breast cancer 


GSTMl 


null (gene deletion) 


nuD/null 




U.4o 


^0 

J7 


Cervical cancer 


TP53 


Pro72Arg 


Atg 






71 


ColoreaaJ cancer 


GSTMl 


null (gene deletion) 


null/nul] 






7ft 
/O 


Colorectal cancer 


NAT2 


590G/A = Taql 


GIG (no *6 alleles) 




A CI 


ft^ 


Head/neck cancer 


CYPlAl 


ne462yal 


UeA/alandVal/Val 




U.Uo 


jfO 


Hcadyneck cancer 


CYPlAl 


3'C/T(Mspl) 


site present = C 




A 


OJ I 


Head/neck cancer 


CYP2E 


S'Rsalsite 


site present/site present 




n c< ■ 
U.JO 


Oft 


Head/neck cancer 


GSTMl 


hull (gene deletion) 


null/hull 




0.48 


100 


Head/neck cancer 


GSTM3 


A/B;(MnU) . 


B/B 




0.06 


iUI 


. Head/neck cancer 


GSTPl 


nel04Vai = A313G 


Ile/Ue 




0.69 


102 


Heiidyneck cancer 


GSTTl 


null/deletion 


null/null 




0.17 


632 


Head/neck cancer 


NAT2 


48lC/t T Kpnl 


T/T = *5/*5 = slow acet>-lator 


0.15 




Head/rieck cancer 


NAT2 


590G/A = TaqI 


. A/A =^ *6/*6 - slow acetylator 


0.03 


1 AQ 


Lung cancer 


CYPlAl 


Ue462Val 


Val/Val 




A Aft 
U.U> 




Lutig dancer 


CYPlAl 


3' OT(Mspl) 


site present/site present = 


C/C 


0.1 1 




Lmtg cancer 


CYP2E 


introh 6 Dral 


site present carrier 




; 0.89 


HI 

1 1 J - 


Lung cancer 


.D1A4 


Serl87Pro 


Ser/Ser 




0.45 


1 I** 


Umg cancer 


GSTMl 


. null (gene deletion) 


null/null 




0.47 




Lung cancer 


MPO 


-463G/A . 


A/A 




A AQ_A no 




Prostate cancer 


AR 


exon 1 GGN repeat 


Si 16 repeats 




A "30 
U./U 




Prostate cancer 


AR 


exon 1 CAG repeat 


<20 repeats 




n 97 


138 


Prostate cancer 


VDR 


1056C/T = Taql 


C/T and T/T 




A 




Prostate cancer 


VDR 


3'UTRpoly-j^ 

S = 14-17. L = 18-24 


S/r. and UL 




n un 


1 JO 


Cardiovascular disease 












151 


CAD^MI 


AGTRI 


1166A/C 


C 






CAD/MI 


APOAl 


3' Pstl 


3.3 kb allele 




0.02 


152 


CAD/MI 


APOB 


Gln4154Lys = EcoRl 


1 ve — 111 \r\\ flll^l^ 

i<ys ~> 1 J* 1 JKu aiicic 




nil 

U.l 1 


153 


. CAD/MI 


APOB 


Arg3clltilu — Mspl 


Glu =9.6kbaUele 




0.06 


636 


CAD/MI 


APOB 


intron 4 Pvull 


Site absent 




U.oO 


DJ/ 


CAD/MI 


APOB 


, Xbal . , 


&6 kb allele 






153 


CAD/MI 


APOE 


epsilon 2/3/4 


epsilon 4 






1 \A 
1 J^ 


CAD/MI 


ACE 


mtron 16 Ins/Del 


Dci/Dcl 




n "iA-Ii TO 


140 


CAD/Ml 


CYBA 


His72TyT 


His/His 




A 7j* 

U.74 


tDu 


CAD/Ml 


F2 


202 lOG/A 


A 




n ni 




CAD/MI 


F7 


Arg353Ghi 


Arg 




u./y 


lOU 


CAD/MI 


GPIBA 


. Thrl45Met = HPA2a/b 


Thr/Mct and Met/Met 




0.15 




CAD/MI 


ITGB3 


Uu33Pro = PIA1/A2 


Pro = A2 




0.10 


167 


CAD/MI 


LPL 


HindlU 


8.7 kb homozygotes 




0*34 


loo 


CAD/Ml 


MTHFR 


677C/T 


T/T (thcrmolabile) 




U.U>-U.U/ 




CAD/Ml . 


NOS3 


Glu298Asp . 


Asp 




A A7 

u.u/ 


171 
1/1 


CSDfUl 


NOS3 


intron 4 27 bp repeat 


4 repeats = a allele 




0.10 


DjO 


CAD/MI 


PLAT 


intron h Alu Ins/Del 


Ins/ Ins 




0.30 


1 / J 


CAD/MI 


PONl 


Argl92Gln 


Arg 




n %\ 
UtJl 


174 • 


CAD/Ml 


SERPINEl 


4G/5G in promoter 


4G 




0J3 


181 


CAD/MI 






Thr/Thr 




0.38-O.65 


179, 180 


DVT 


F2 


20210G/A 


A 




0.01 


639 


DVT 


F5 


Arg506Gln 


Gin (Leiden) 




0.02 


7 


DVT 


MTHFR 


6770T 


T/T (thermolabile) 




0.18 


19 


HTN 


ADD I 


GIy460Trp 


Trp 




6.12-0.16 


640,641 


HTN 


AGTRI 


1166A/C 


C 




0.28 


195 


HTN 


CYPnB2 


344Crr 


T 




0.49 


196 


HTN 


ACE 


intron 16 Ins/Del 


Del/Del 




0.41 


193 


HTN 


GNB3 


825C/T 


T 




0.25 


202 


HTN 


NOS3 


Gtu298Asp 


Asp 




0.10-0.12 


197 


HTN . . 


NOS3 


* intron 4 27 bp repeat 


4 repeats = a allele 




0.04: 


198 


.HTN 


NPPA 


; intron 2 Hpall 


Site absent 




' 0.03 


172 


HTN 


SERP1NA8 


Mcl235Thr 


llir 




035-0.38 


210 


■ HTN ■ ' 


SERPINA8 


Thrl74Met' 


Met \ 




0.08: . 


210 


Dermatology 










0.04-O.05 


222,223 


, ju\%iule onset psoriasis 


TNF 


-T238G/A > 


- A - ■ ' 




. ^ Continued 
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Table 2 

(Continued) 



Discasc/irait 


Gene 


Polymorphism 


Risk allele/genotypc 


Frequency 


Reference 


Endocrinology 










642 


. Graves' disease 


CTLA4 


Thrl7Ala 


Ala 


0.36 


Male infertility 


AR 


GAG repeat 


&28 repeats 


0.10 


235 


Obesity 


ADRB2 


Gln27Glu 


Ghi 


030 


. 238 


Obesity 


ADRB3 


Trp64Arg 


Arg 


0.15 


239 


■ Osteoporosis/fracture 


COLIAI 


intron 1 G/T (Spl site) 


T = sallele 


0.14 


250 


Osteoporosis/fracture 


VDR 


Bsmlsite 


B/B homozygotes 


0.03 


252 


■pcds 


CYP17 


-34T/C = MspAI 


aCandC/T 


6.46 


254 


• Type 1 diabetes 


CTLA4 


Thrl7Ala 


Ala 


033 


266 


*: Type 1 diabetes 


INS 


S'VNTR 


Class I aHele 


0.67 


.273 


Type I diabetes 


NEURODl 


Ala45Thr 


Thr . 


• . 0.05 


■ 275 


. Type 2 diabetes 


ABCC8 


cxon 22 C/T(codon 761) . 


.T . :. 


0.01-0.03 


279 


. ; Type 2 diabetes 


ABCC8 


intron 24 -3T/C 


G ■ : 


0.43-0.49 


279 . 


Type 2 diabetes 


GCGR 


Gly40Scr 


Ser ■ . 


0.01-0.02 


285, 286 


. Type 2 diabetes 


GCK 


3' CA repeat 
5' CA repeat 


K+4 allele 


0,12 


287 


Type 2 diabetes 


GCK 


-2 allele 


0.04 


643 . 


Ty^ 2 diabetes 


INS 


VNTR 


Class lit allde = large 


0,33 


291 


Type 2 diabetes 


INSR 


SstI 


S^kbaHele 


0.04-0.06 


292,293 


„Ty^ 2 diabetes 


INSR 


Val985Met 




0.01 


644 


■ Type 2 diabetes 


IPFI 


Asp76Asn 


Asn ; 


o.oi 


294 . 


; Type 2 diabetes 


KCNJU 


Glu23Lys 


Lys. - . ■ ■ - 


0.37 


*296 


. Type 2 diabetes 


PPARG 


Prol2Ala 


Pro 


0.91 


37 


Type 2 diabetes 


PPPIR3 


Ins/Del in ARE 


Del 


0.48 


298 


Type 2 diabetes 


SLC2A1 


Xbal 


6^ kb^ site absent 


0.14-O.30 


645 


Type 2 diabetes 


S1.C2A2- 


TaqI 


13 kb ;= site present. 


0.89 


301 


Type 2 diabetes 


FRDA 


GAA repeat 


10-36 repeats 


0,03-0.04 


284 


Type 2 diabetes 


GYSl 


Xbal 


A2 - site present 


0.04 


289 


Gastroenterology 








0.06 


m 


. IBD 


F5 


Arg506Gln 


Gln(Lcidcn) 


Infectious disease 










324 


HiVinfcction/AIDS 


CCR2 


VaI64lle 


Val with AIDS 


0.87 


HIV infection/ AIDS 


CCR5 


32 bp Ins/Del 


Ins with infection 


0.90-0.91 


325, 326 


Miscellaneous 










354 


Overall mortality 


APOE 


epsUon 2/3/4 


epsUon4 


a22 


Neonatal dLsease 












Cleft lip/palate 


TGFA 


Taql . 


2.7 kb allele 


0.05 


364 


Cleft lip/palate 


TGFA 


BamHI 


4.0kbaUete 


0.87 


364 


Neural tube defect 


MTHFR 


677Cn' 


T/T (thermolabile) 


0.02-0.06 


16, 17 


Neural tube defect 


T 


intron? +2T/C 


C 


0.30 


367 


Neural tube defect 


MTR 


2756A/G 


A/AandA/G 


0.90 


366 


Neurology 






Del 


0.23 


374 


Alzheimer's disease 


A2M 


exon 18 5' splice Ins/Del 


Alzheimer's disease 


A2M 


VallOOOIle 


Val/Val 


0.07 


375 


Alzheimer's disease 


APOE 


epsilon 2/3/4 , 


epsilon 4 


0.16-0.24 


646,647 


Alzheimer's disease 


BCHE 


Ala539Thr (K variant) 


Thr 


0.13 


648 


Alzheimer's disease 


BLMH 


Ile443Val 


Val/Val 


0.07 


383 


Alzheimer's disease 


CTSD 


AIa224Val 


Val 


0.07 


384 


Alzheim er ' s disease 


LRPl 


766T/C(exon3) 


C 


0.80 


649 


Alzheimer's disease 


LRPl 


tetranucleotide repeat 


87 bp 


0.36 


650 


. Aldieimer's disease 


SERP1NA3 


AlalSThr 


Ala/Ala 


0J,7 


390 


Alzheimer's disease 


PSENl 


16A/C (intron B) 


A/A ' 


0,27-0.28 


389,651 


Alzheimer's disease 


VLDLR 


5' UTRCGG repeat 


5 repeats 


a36 


396 


Oeutzfeldt-Jakob disease 


PRNP 


Metl29Val 


. Met/Met 


0.37 


397 


Ischemic stroke 


APOE 


epsilon 2/3/4 


epsilon 4 


0.06 


652 


: ' Ischeinic stroke 


ACE 


intron 16 Ins/Del 


Del/Del 


0.22 




Ischemic stroke 


F2 


G202ldA 


• A / ;■ , 


0,01 


410 


ischemic stroke 


MTHFR 


677C/T 


T/T (thermolabile) 


0.10^.21 


413,414 


Ischemic stroke 


NOS3 


Glu298Asp 
5' TGGA repeat 


Glu 


0.61 


.415 . 


• Multiple sclerosis 


MBP 


a2.14kb alleles 


0.13 


424 


Parkijison's disease 


COMT 


VallSSMet 


Met/Met . 


0.06 . 


653 


■-. Paikinson's disease 


CYP2D6 


BstMI site 


B allele " ' 


0.10 


435 


: Parkinson's disease 


DRD2 


- iiitron 2 GT. repeat V 


aflele 3 =,..122 bp . . 


■ . 0.45 ■ 


437 


Parkinson's disease 


MAOA • 


intron 2 CT repeat 


A4=n9bp 


* 0;b6- ■ 


.440 


■ .'Parkinson's disease 


MAOB 


iniron 13 G/A . 


allele I / . 


0.45 . 


■441' 


V Parkinson's disease 


■NAT2 ■ ^ 


.48lC/t^ Kpni ' / : 


T =5 *5 -= slow acelyhtor ■ . 


■■ 0.31 


.^^443 ■ 


' ;* Parkinson's disease 


SERPINAS • 


AlalSThr . 


AWAla 


. 6.08 . 


• ■ . . .445- 


' Parkinson 's.disease 


SLC6A3 


; y WRVNTR 


1 1 repeats 


6.01 


. ■;446:..- 


- ••■ • . • * .^Continued 
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(Continued) 



Disease/trait 



Gene 



Polymorphism 



Risk alleie/gcnotype 



Frequency 



Reference 



Obstetric disease 

Preedampsia ps 

Preeclampsia MTHFR 

Fhamiacogenetics 

Clozapine response DRD3 

Clozapine response HTR2A 

Clozapine response HTR2A 

Prug-induced tardive dyskinesia DRD3 

Tacrme Response AFOE 

Psychiatry 

Anorexia HTR2A 

ADHD DRD4 

Bipolar disorder COMT 

Bipolar disorder MAOA 

Bipolar disorder MAOA 

Bipolar disorder MAOA 

Bipolar disorder SLC6A4 . 

Bipolar disorder TPH 

■ Depression . . . GOMT 

Depression SLC6A4 

Depression SLC6A4 

OCb SLC6A4 

Schizophrenia . APOE 

Schizophrenia COMT 

Schizophrenia DRD2 

Schizophrenic DRD3 

Schizophrenia HMBS 

Schizophrenia KCNN3 

Schizophrenia HTR2A 

Schizophrenia NTF3 

Pulmonary disease 

Asthma/atopy ACE 

Asthma/atopy IL4 

Asthma/atopy IL4R 

Asthma/atopy IL4R 

Asthma/atopy 1-TA 

Asthma/atopy MS4 A I 

Asthma/atopy MS4A1 

Astlima/atopy TNF 

COPD/emphysema EPHX1 

COPD/emphysema TNF 

COPD/emphysema SERPINAl 

Rheumatology 

SLE crij\4 

SLE FCGR2A 
SLE MBL2 
SLE TNF 



Arg506Gb) 

677CyT 

Sa9Gly 

102T/C 

His452Tyr 

Sei^Gly 

epsilon.2/3/4 

-1438A/G 
exon 3 VNTR 
VallSSMet 

CA rci>eat 
5' VNTR 
94i:t/G 

intron2VNTR 
intron7 2l8A/C 
VallSSMet. 
intron2VNTR 
5^Ins/Dd(5HTTLPR) 
5' Ins/Del (5HTTLPR) 
epsilon 2/3/4 
VallSSMet 
-141C Ins/Del 
Ser9Gly 
intron I ApaU 
second CAG repeat 
102T/C 
. 5' dinudeotide repeal 

intron 16 Ins/Del 

men 

Gln576Arg 

IlcSOVal 

intron 1 Ncol 

fleI81jLeu 

Gly237Glu 

-308G/A 

Tyrll3His 

-308G/A 

Taql 

Thrl7A!a 
HUl3lArg 
Gly54Asp 
-308G/A 



Gin (Leiden) 

T 

Ser/Ser with no response 
T with no response 
Tyr with no response 
Gly/Gly with dyskinesia 
epsilon 4 with no response 

G 

£7 repeats 
Met 

32 is protertive 
vl-v3 (long alleles) 

r . 
12 repeats 

C 

Mct/Val and Val/VaJ 
9 repeats 
Del/Del 
Ins. 

epsilon 4 

Val 

Ins 

Scr/Ser 

At least one site present 

> 19 repeats 

C 

A3= 147 bp 

Del/Del 
T 

Arg 
He 

5.5 kb = allele 1 
Leu/Leu and Fie/Leu 
Glu 
A 

His/His 
A 

2.4kbaUeIe = T2 

Ala 
Arg 
Asp 
A 



0.02 
0,11 

0.35 
0.46 
0.07 
0.04 
0.41 

0.41 
0.12 
0.18 
0.21 
0.61 
0.65 
0.54 
0.36 
0.57 
0.01 
0.18 
0.01 
0.15 
0.68 
0.78 
0.37 
0.69 
0.14 
0.39-0.56 
0.20 

0.28 
0.70 
0.10 
0,40 
0.33 
0.12 
0.03 
0.18 
0.06 
0.02 
0.02 

0.26 
0.45-0.48 
0.09 
0.11 



459 

461 

469 

471 
. 654 
478 
491 

494 

. 496 
506 
655 
512 
512 
517 
518 
522 
525,656 
657 
530 
533 
536 
537 
538 
542 
658 
544. 659>660 
547 

552 
560 
561 
661 
563 
564 
662 
563 
575 
580 
578 

617 
619 
623 
663 



For each disease/trait, genes and polymorphisms Mthin those genes are Usted if there are at least three studies (and at least one achicvmg statisucai significance) that test 
association between the polymorphism and the disease or trail. Gene symbols are as in Table 4. Associations widi at least one replication (more than one report adueving 
statistical significance) are indicated in boldface. For describing die polymorphisms, standard amino acid abbreviations are used for misscnse p^ymorphisms and the start 
codon is numbered as 1.- Where nucleotides are used to describe the polymorphism, numbering is as used in the studies and may refer to the start site of translation, 
transcription, or intron/exon boundar>% depending on the context. Other types of polymorphisms include VNTRs (variable number tandem repeat); di-. tn-, or tetra- 
nucleotide repeats, Ins/Del (insertion deletion) polymorphisms, restriction firagment length polymorphism (indicated by the restncQon enzyme used); po^udeottde tracts; 
or polymorphisms in the UTR (untranslated region). Tlie aUde(s) or genotype(s) con^ 

meis) or genotyi>c(s) U indicated. The final coluiixn gives the first identified reference(s) reporting a significant association between the polymorphism and diseasc/trait. 
FuB citations can be found at wwvt'.^rietia'mmetiicine.vTg. IBD. infla mmatory bowel disease. For other abbreviations* sec Table 1 . 



some authors believe that even minimal ethnic matching of 
cases and controls is adequate to prevent stratification.^^ How- 
ever, there are as yet no empirical data that address the degree of 
stratification found in a typical association study. 

UNKA6E DISEQUIUBRiUM 

' Failure of replication can also occur if the polymorphism 
being tested is not itself the causal variant but is rather in linkage 
disequilibritun with the causal variant. Linkage disequilibrium. 



in which nearby variants are correlated with each other, more 
often than expected by chance, depends heavily on population 
history and on the genetic makcrup of the founders of . that 
population. If all examples of a particular stretch of DNA in a 
populatibn derive from a recent common ancestor; there will 
have been few oppbrtunities for recombination events to sepa- 
rate variants within that stretch of DNA and the variant? will 
often be inherited together throughout the population. If, in a 
different popidation, the time since a common ancestor is 
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Table 3 

Highly consistently reproducible associations (^75% positive studies) 



Discasc/trait 


Gene 


Polymorphism 


Risk allclc/gcnotype 


Frequency 


Reference 


DVT 


F5 


Arg506Gln 


Gin (Leiden) 


0.015 


7 


Graves' disease 


CTLA4 


ThrI7Ala 


Ala 


0.62 


642 


Type \ diabetes 


INS 


5'\'NTR 


Class I allele 


0.67 


273 


HIVinfcction/AlDS 


CCR5 


32 bp Ins/Del 


Del with protection 


0.05-0.07 


325, 326 


Alzheimer's disease 


APOE 


epsilon 2/3/4 


epsilon 4 


0.16-0.24 


646.647 


Crcuifeldtrjakob disease 


PRNP 


Metl29Val 


Met/Met 


0.37 


397 



Associations between polymorphisms and disease where at least 75% of identified studies achieved statistical significance are shown; the format is as in Table 2. DVT, 
deep ycxn thrombosis. 



longer, more recombination events will have occurred, dis- 
rupting linkage disequilibrium in the region. Furthermore, the 
particular arrangement of variants in the founders of a popu- 
lation will determine which variants are inherited together. 
Thus, it is possible that a polymorphism will be in linkage 
disequilibrium with a nearby disease allele in one population 
but riot in another, leading to variable results of association 
studies. For exaniple, many of the associations with TNF in 
Table 1 might reflect associations with nearby HLAloci (HLA 
is a region with strong linkage disequilibrium over large dis- 
tances). To explore this possibility, positive associations 
should be followed up by testing adjacent markers (both indi- 
vidually and as multi-marker haplotypes). If linkage disequi- 
Ubrium is present (and particularly if any of the haplotypes or 
adjacent markers show stronger association), the possibility 
exists that the original marker tested is not the causal allele, and 
further studies of the region are warranted. Although it should 
be possible to exhaustively test modest sized regions of linkage 
disequilibrium, special circumstances (e.g., recentiy admixed 
populations) may in theory give rise to correlation between 
markers at much greater distances. 

GENE-GENE AND GENE-ENVIRONMENT INTERACTIONS 

Another potential source of variable findings is gene-gene or 
gene- environment interactions that differ between populations. 
For example, if the effect of a variant were only manifest in pop- 
ulations with a particular genetic or enviroiimental bad^round, 
then association would only be seen in populations or subgroups 
with the appropriate genetic or environmental characteristics. 
This explanation is commonly invoked to explain differing results 
of association studies but is less frequentiy supported by direct 
. evidenced A further problem arises when considering gene-gene 
or gene- environment interactions: when combinations of alleles 
: and/or environmental feaors are studied, P values are rarely cor- 
rected for the number of tests reported (inuch less the number of 
t£sts actually performed). Such "nominally" significant results 
i must be considered to be the product of hypotiiesis generation 
. lather than hypothesis testing and, therefore, require replication^ 
y Perhaps the best possible method of demonstrating that a gene^ 
; environment interaction is likely to be correct (and not a statistical 
^ ;.fiuctuation expected when (exploring numerous hypotheses) is to 
r diyide the study population riandomly into two parts and require 
^tfiat any findings be observed in both parts of the study. Sample 



sizes need to be increased slightiy to maintain power, but the abil- 
ity to generate and then test hypotheses in the same sanxple wotild 
seem to outweigh this; consideration. Otherwise, one requires a 
replication population that is exactly matched for environmental 
and genetic background, an extrenidy 



WEAK GENEnC EFFECTS AND LACK OF POWER 

Finally, associations can be real but nonethdess not repro- 
ducible if the underlying genetic effect is weak. If the subse- 
quent studies are small in size, they will be underpowered to 
reliably detect weak effects and, therefore, fail to achieve statis- 
tical significance; This difficulty is heightened by the "jackpot" 
effect, in which the first group to publish a significant associa- 
tion involving a weak locus is more likely to have overesti- 
mated than underestimated the true effect of the polynior- 
phism. This phenomenon occurs because each study 
imprecisdy estimates the strength of the effect (due to sam- 
pling variation). Because a weak effect would in most cases not 
provide a statistically significant finding in a typically sized 
study (a few hundred cases and controls), the first published 
study that does manage to achieve statistical significance is 
ahnost certain to have overestimated the true effect of the vari- 
ant being tested. Subsequent studies thus need to include much 
larger numbers of patients to achieve statistical significance. In 
particular, failure to observe the magnitude of effect seen in the 
first study should not be taken as a repudiation of the associa- 
tion. We observed this phenomenon for the association of type 
2 diabetes and a Prol2Ala polymorphism in the PPARG gene, 
where an initial study estimated the effea on diabetes risk to be 
threefold,*^ but subsequent studies observed very modest risks 
that usually did not achieve statistical significance.^**-'*^ 
tested the variant in seyeral.large populationis and found that 
the effect on diabetes risk was modest (1.25 -fold) but signifi- 
cant (P.= 0.002 in our data alohe^). Indeed, all of the previous . 
studies, botii positive and negative, were .consistent with this 
L25-fpid effecti and two subsequent large studies confiriiieid 
this assodatlbn.*'^''*^ Because many alleles may have similarly 
weak genetic effects, large studies, and/or meta-analyses of 
multiple studies will often be required to determine whether 
genetic associations betweeji pplyniorphisms and disease are 
significant. . • . ; 
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Table 4 

Gene symbols with OMIM numbers and aliases/descriptions 



Gene symbol 


0M1M# 


Aliases/descriptions 


A2M 


1 A-iQcn 




ABCC8 






ACE 


lUOlOU 


Aninrtt^ncin mnwrtino mrvme 1* DdPl* dlDeOtidvl CarboXVPeDtidtlSe 1 


ACPI 


171500 


ixClQ. pnospnaiasc ii suiuuJC ^cryuirui.]ri,cy 


ADA 






AVul 


1 V^UOU 




nun ID 


103720 


ADH2; class I alcohol dchydrogena sc> beta polypeptide 


AlJrl4 


103740 


Alrrthnl H^hvHmoena.v 4- 




102776 


Adenosine A2a receptor; ADORA2; RDC8 


A rsDDX 


1 73870 


■ ADP-ribos}itransfcrase; poly(ADP) ribose polymerase; PARP 




109690 


Beta 2 adrenareic reccotor 


ADKJt)^ . 


iU707X 




AUJKl 


IVOlVD 


Anointmcin If rM:£Dtor. tVDC 1 


ALAD 


1 'y^^in 




ALDH2 




/uucnyoc ticn)fvinjgci«i>c ^ 


ALU A3 . 




' Arachidonkte S'liDOXvcenase 


AMPDl 


iri777fi 

l\J£/ /U 


A rtpTtr\<i HP mnnnnhocohate deaminase 1: \1AD A 


APBBl 


pU* fltv 


a m viniH hi»tfl rir^cursor oratein -bin dine femilv Bi member 1; FE65 


APC 


1/31UU 




APOAI 


tU/OOU 


AnAltnnnrotPtn A-I 

^VlfUll uLvK ■ /ft & 


AP0A4 


iU/07U 


A*iirtlir*onrAtPin A-TV 


APOB 




AnnMnrtnmtPin K 

/iLmilL/UpiUi'Vul pU 


APOCl 


107710 ■ 


Anrtlinnnrntcin C-1 


APOC2 


ZU/ /Mi 


AT^/\1tnnnrAtpin ^-11 


APOD 


107740 


ApolipopFotein' D 


APOE 


1A7741 


- A nnlirvAnrAtMn E 


AR 


313700 


Androgen receptor 


ATP.1A3 




ATP««/» Wa4-/IC+ tramnortinff. aloha 3 DolvDeotide 


BCHE 


1//4UU 




BCL2 




Q.roll f^T \ /IvmnHnma 2 
n~ccii v^LiUiyiiipiiuiii* * 


BCt3 




R-r^l Ipuk^mta/lvniDhonia 3 


BDKRdI 




' Bradykinin B! receptor; kinin Bl receptor 


BLMH 


602403 


Bleomycin hydrolase 


C4A 


120810 


compicmeiH uuinpuiicui tn. 


C4B 


1 7hR7n 


f^mnlprnmt comoonent 4Bi C4F 


CCK 


1 iSiAAn 

1 lon^U 


\^ 1 VI vVi r a uJiu > • t * ' 


CCKBR 


1 1 Q>(j<C 


l^Kri1i»/-vcfnVrtnin R r<»rjTitnr' rastnn receolor 

(...nUlCCYolwcUtllil *f lvvc|/iui» ^aaiiui ftv*«v-j/*v» 


CCR2 


OiUZD/ 


Chemokine (C-C motif) receptor 2; CKR2; CMKBR2 


CCR5 


DU13/J 


Chemokine (C-C motif) receptor 5; CKR5; CMKBR5 


CD14 
CD36 


173510 


CD 1 4 antigen 

Thrnmhrtcnnndm rpcentor collaeen tvDC 1 reccpton fatlv acid translocasc; platelet glycoprotein Hlb; GPIHb 


CD3D 


1 QUI on 




CD4 




rViA antitTpn fnSSV T4/LEU3 


CDKNIA 


• 1 1 6899 


^^ir/*llr» /I Art^ti/I f»nt Ifinacp inhiHitnr 1 A* 112 1* ClollWArl 
VtyCUO'tlcpCuOcni JUiiodC uiiuuiiui inj it vnij*** ^tj^i & 


CDSN 


602593 


a gene tcomeoaesmosini 


CHTP 
CFTR 


118470 
602421 


Cholesteryl ester transfer protein . /• -t u •7\ Ar>r-r>t 

rikMefo tnncmomVtranp r-nn H ii ft -^nrp rpffulator ATP -bind inc cdsscttc (sub-familv C. member 7); ABCC7 
L*ystic uDrosis Transmemoranc conoucinnvc icjjutavuit 1 1 o ^ ...-j..—^—. 


CHRNA4 


. 1 18504 


KTAitrAnol nimHnir' arPtvlrKnlinp rPCPntOT alDna*4 Subunit 

jMeuronsi nxcouniCatciyifcnuiinc ici.c||fiuij oi^itu t ^uwujuv 


CMAl 


i 18938 


iViasi ceil cnymase i 


COLlAl 


120150 


Lxjiiagen lypc i aipud i 


COL2A1 


120140 


(wfOtiaKenf type iii aipna it wiuiiuru^.aj^.iiii 


COL9A2 


600204 


t^uagen* lype ia» aipna ^* ci^m* 


COMT 


1 16790 


i^ieCnOl W-mClAyiUallftlCtaaC 


CRH 


122560 


* Corticotropin releasing hormone 


CTLA4 


123890 


^yioioxic 1 -iy*npiiotyic*a»30ti*icu ptuicm "t, ^i^i*'* 


CTSP 


1 16840 


Cathepsin D; lysosomal aspartyl protease 


CXSCRl 


601470 




CYBA 


233690 


Cytochrome b-245 alpha; p22-PHOX 


CYPIIA 


U84S5 


Cholesterol side chain cleavage enzyme; qtochrome P450, subfamily XTA 


CYPUB2 


124080 


Aldosterone synthase; steroid 1 1 -beta-hydroxylasc; cytochrome P450, subfamUy XIB. polypeptide 2 


CYP17 


202110 


l7-a!pha-hydrbxylase;-17»20 lyase; cytochrome P450, subfiamily XVII 


eYP19 


107910 


; Arqmatase; cytochjrome P450» subfiamily XIX ; , . . 


CYPlAl 


108330 


Cytdchronie P450, subfamily lA, polypeptide 1 


CYP.lBl 


601771 


Cytochrome P450, subfamily IB, polypeptide 1 


CTP2A6 


■122720 


. .QtochromeP450;5ubfamity^ 


CS'P2C19 


124020 


Cytochrome P450, subfamily nC, polypeptide 19 


CYP2C9 


601130 


Cytochrome P450, subfamily lie, polypeptide 9 . . 


CYP2D6 


124030 


Cytochrome P450» subfamily iro, polypeptide 6; debrisoquine 4-hydr 


. GYP2E . 


124040 


Cytochrome P450. subfamiJy.IlE; CYP2E1 


.eYP3A4 


124010 


Cytochrome P450> subfamily 3A. polypeptide 4; gjucocorticoid-indudble P450 
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Table 4 

(Continued) 



Gene symbol 


OMIM # 


Aliases/descriptions 


.DBH 


223360 


Dopamine bcta-hydroxylase 


DDC 


107930 


Dopa decarboxylase; aromatic L-amino add decarboxylase 


DIA4 


125860 


NQOI; Diaphorase; NAD(P)H:quinone oxidoreductase 


VIST 


126063 


Alphaketoglutarate dehydrogenase,. E2 subunit; dihydrolipoamide S-sucdnyltransfcrasc 


DRDl 


126449 


Dopamine receptor Dl 


. DRD2 


126450 


Dopamine receptor D2 


DRP3 


126451 


Dopamine receptor D3 


' ^ DREW 


126452 


Dopamine receptor D4 


■ vDRDS 


126453 


Dopamine receptor D5 


. EPNRA 


131243 


Endothelin receptor type A 


•: ■ ELAC2 


605367 


HPC2; claC (E. coli) homolog 2; prostate cancer, hereditary, 2 . 


;. EN2 


131310 


Engrailed homolpg 2 


.-•VENG 


131195 


EndogI in; CD 105; Osier- Rendu- Wcbcr syndrome 1 


:; EPHXl 


132810 


Microsomal epoxide hydrolase I (xenobiotic) 


. ERBB2 


164870 


HER-2; NEU; v-erb-b2 avian erythroblastic leukemia viral oncogene homolog 2; NGL 


ERCCl 


126380 


Excision repair cross- complementing rodent repair deficiency, compleinentation group 1 


^ ;esri 


133430 


Estrogen receptor 1; estrogen receptor alpha 


^ ETSl 


164720 


v-ets avian erythroblastosis virus E26 oncogene homolog 1 . . 


: lil3AI 


134570 


Coajgulation factor XlIl. Al polypeptide 


r: F2 


176930 


Prothrombin; coagulation fector n . 


■V :F3 


134390 


Tissue factor; thromboplastin; coagulation fartor HI . 


y F5' 


227400 


Coagulation faaorV 


;VF7 . 


227500 


Coagulation factor Vll 


FCGR2A 


146790 


CD52;FclgG low affinity Ila receptor . 


: FCGR3A 


146740 


CD 16; Fc frament of IgG, low affinity receptor Ilia 


: FGA 


134820 


Fibrinogen alpha, A polypeptide . 


FGB 


134830 


Fibrinogen, B beta polypeptide 


: FMRl 


309550 


Fragile X mental retardation 1 ; FRAXA 


FRDA 


229300 


Frataxin; Friedreich ataxia; X25 


. FSHB 


13i5530 


Follicle stimulation hormone, beta polypeptide 


FST 


136470 


FolUstatin 


• GABRA5 


137142 


Gamma-ammobutyric acid (GABA) receptor A, alpha 5 


= GABRB3 


137192 


Gamma -aminobutyric add (GABA) receptor A, beta 3 


Gc: 


139200 


Vitamin D binding protein; group-specific component 


GCGR 


138033 


Glucagon receptor 


GCK 


138079 


Glucokinase; MODY2; hexokinase 4 


GNAL 


I393I2 


G(olf) alpha; G proton, alpha activating activity polypeptide^ olfactory type 


GNASl 


139320 


G-protein alpha stimulating activity polypeptide 1 


GNB3 


139130 


G-proiein beta, polypeptide 3 


GPIBA 


231200 


Platelet glycoprotein lb, alpha polypeptide 


GPXl 


138320 


Glutathione peroxidase 


GRIKI 


138245 


Glutamate receptor, ionotropic, kainate 1; glutamate receptor 5 


GSTMi 


138350 


Glutathione S-transferase Ml; glutathione S-transferase mu- 1 


GSTM3 


138390 


Glutathione S-transferase M3 (brain) 


GSTPl 


134660 


Glutathione S-transferase pi; GST3 


Gsm 


600436 


Glutathione S-transferase theta 1 


- GYSl 


138570 


Glycogen synthase (muscle); GYS 


HFE 


233200 


Hemochromatosis; HLAH 


HMBS 


176000 


Porphobilinogen deaminase; hydroxymethylbilane synthase; PBGD 


HNMT 


605238 


Histamine N-methyhransferase 


HRAS 


190020 


v-Ha-ras Harvey rat sarcoma viral oncogene homolog; HRASl 


HRH2 


142703 


Histamine receptor H2 


HSDliB2 


218030 


1 1-beta hydroxysteroid dehydrogenase 2; AME 


> : HSPAIA 


140550 


Heat shock 70kD protein. 1 A; hsp70-l 


. HSPA2 


140560 


Heat shock 70kD protein 2; hsp70-2 


: HSPA8 


600816 


Heat shock 70kD protein 8; HSC70 . . 


. HTRIB 


182131 


5-hydroxytryptamine (serotonin) receptor IB; 5HTlD(beta) 


HTR2A 


182135 


5- hydroxyiryptainine (serotonin) receptor 2 A; HTR2 


. ^ .HTR2C 


312861 


5-hydroxytryptamine (serotonin) receptor 2C; HTRIC. 


■ - HTR5A 


601305. 


5-hydroxytryptamiiJe (serotonin) receptor 5A 


■ .;htr6 


601109 


5-hydroxytr>'ptamine (seirotonin) receptor 6 


■= : iqAMi 


147840 


Intercellular adhesion mpieculc. I ; CD54 . - ■ 


:ltNG 


147570 


. Interferon gamma ■ - . . 


: = -iGFa 


147470 


Insulin-like growth factor 11; somatomedin A 


.-;,lGHV3-5 


600949 


Immunoglobulin heavy chain variable re^ 


. JGHV3-30-5 


147070 


. Hurnhy3005;.inimunogiobuUh heavy chain \'ari^^^ 




124092 


Intcrieukin 10 ■ 


•::'f-"lLl3 


147683 


Intcrleukin 13: 


^/Yi^^XUA . . 


147760 


Interleukih 1-alpha 




147720 


Intcrleukin 1-beta -. . 
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(Continued) 



Gene symbol 


OMTM# 


Aliases/descriptions 


ILlRN 


147679 


Interlcukin 1 receptor antagonist; ILIRA 


1L.4 


. 147780 


lnterieukin4;BSFl. 


IL4R 


147781 


Interlcukin 4 receptor 


IL6 


147620 


Intericukin 6; intcrferon« beta 2; B-cell differentiation facton BSF2; HSF 


IL8 


146930 


Intcrleukin 8; NAPl; SCYB8; monoc>te-derived neutrophil chemotactic factor 


IL9R 


300007 


Interlcukin 9 receptor 


IMPAl 


602064 


Tnositol(myo)- 1 (or 4 )-raonophosphatase I 


INS 


176730 


Insulin 


INSR 


147670 


Insulin receptor 


IPFl 


600733 


Insulin promoter fector 1; PDXl; IDXl; STFl ; M0DY4 


iRSl 


147545 


•Insulin receptor substrate 1 ■ 


rrGA2 


192974 


: Platelet ^ycoprotein Ia/lla;.iniegrin, alpha-2 ; CD49B; VLA2 receptor* alpha-2 subunit 


ITGB3 


173470 


Glycoprotein Ilia; ihtqgrinvbcta'-3; CD61 


KCNiril 


600937 


Kii6.2; BIR; potassium inwardly-rectifying channel, subfamily J, member 11 


KCNN3 


.602983 • 


hKCa3; SKCA3; SK3; hSK3; potassium inicrmediate/small conductance c^Icium-aaivated channel, subfemily N, 






. member.3 , 


KLKiBl 


229000 


Kallikrein B, plasma; fonnerly KLK3 


LDLR 


143890 


Low density, lipoprotein receptor, femilial hypercholesterolemia 


LEP 


164160 


Leptin;6b' . . . ■ . 


LHB 


152780 


Luteinizing hoWnone, beta polypeptide 


LIPE . 


.151750 


Hoirnone sensitive lipase . . .. 


LPL . 


238600 


Lippprptein lipase 


LRPl 


. 107770 


. Low density lipbprotcin-related protein 1 ; alpha- 2-macroglobulin receptor; ApoE receptor 


LTA 


153440 


TNF beta; iymphotoxin A: TNF superfemily, member 1 


Maoa. 


309850 


Monoamine oxidase A 


MAOB 


309860 


Monoaniinc oxidase B; MAO, platelet; MAO, brain 


MAPI . 


157140 


Miaotubiile-associated protein tau; MTBTI 


MBL2 


154545 


Mannose binding lectin; mannose binding protein; MBP 1 


MBP 


159430 


Myelin t«sic protein 


MCIR 


155555 


Melanocortih I receptor; alpha melanocyte stimulating hormone receptor, MSHR 


MGMT 


156569 


O-6-methylg^anine-DNA methyltransferase 


MLHl 


120436 


MutL (E. coli) homolog I; colon cancer, nonpolyposis type 2; HNPCC2 


MMPl 


120353 


Matrix metalloprbteinase 1; interstitial coUagenase 


MMP3 


185250 


Matrix metalloproteinase 3; stromelysin 1; progdatinasc 


MMP9 


120361 


Matruc metalloproteinase 9; gelatinase B; 92kD type IV coUagenase 


MPO 


254600 


Myeloperoxidase 


MS4AI 


147138 


Fc IgE receptor. Membrane-spanning 4-domains, subfamily A, member I 


MSH3 


600887 


MutS (E. coli) homolog 3 


MSXl 


142983 


Msh (Drosophila) homeo box homolog 1; HOX7; HYDl 


NITHFR 


236250 


5,10-methyIene tetrahydrofolate reductase 


MTR 


156570 


Methionine synthase; 5-methyItetrahydrofolale-homocysteine raelhyltransferase 


MUCl 


1 58340 


Mucin 1, transmembrane 


MUOA 


158371 


Mucin 3A, intestinal; MUC3 


MYC 


190080 


v-myc avian myelocytomatosis viral oncogene homolog 


MYCH 


164850 


L-myq v-myc avian myelocytomatosis viral oncogene homolog I, lung carcinoma derived 


NATl 


108345 


N-acctyltransferase 1; arylamine N-acetyltransferase 1; AACl 


NAt2 


243400 


N-acetyltransferase 2; arylamine N-acetyltransferase 2; AAC2 


NEURODl 


601724 


Neurogenic differentiation; beta2 


NMB 


162340 


Neuromedin B . 


NOSl 


163731 


Neuronal nitric oxide synthase 


NOS2A 


163730 


Inducible nitric oxide synthase 


NOS3 


163729 


Endothelial nitric oxide synthase; ENOS 


NPPA 


108780 


Natriuretic peptide precursor A; atrial natriuretic polypeptide; ANP; ANF 


NPY5R 


602001 


Neuropeptide Y receptor Y5 


.NTF3 


162660 


Neuroirophin 3; neurotrophic factor 3; NT3 


OPRMl 


600018 


Opioid receptor, mu 1 . 


OPRSi 


601978 


Type I Sigma receptor; SR-BPl ; sigma receptor (SR3 1 747 binding protein 1) 


PCSK2 


162151 


Prohormone conyertase 2; proprotein convertase subtilisin/kexin type 2; PC2 


PGR 


264080 


Progesterone receptor; PR 


PLA2G1B 


172410 


PhosphoUpase.A2, group IB (pancreas); pancreatic phospholipasc; PLA2A; PLA2. 


PIA2G4A 


600522 


Phospholipase A2, group 1 VA (cytosouc); cPLAZ . . 


PLA2G7 


601690 


Platelet-activating factor acetylhydrolase; phospholipase A2 group Vll 


PLAT 


173370 


. TPA; tissue plasminogen activator. 


PLCGl 


172420 


Phospholipase C, gamma I; PLCl; phospholipase C-I48; PLC148 


PONl 


1^20 


Paraoxonase 1 


.PbN2 


602447 . 


Paraoxonase2- 


PPARG 


601487 


Peroxisome prollferatpr-activated receptor, gamma; PPAR gamma 


PPP1R3 


600917 


Protein phosphatase 1, regulatoryXinhibitpr) subunit 3 


.PRNP 


176640 


Prion protein; PRP ■ . , 


PRTN3 


177020 


Proteinase 3 (serine proteinase, neutrophil. Wegener's granuloniatosis autoantigen); AGP7; p29 
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(Continued) 



Gene symbol 


OMIM# 


Aliases/descriptions 


PSENl 


104311 


Presenilin l;PSl; AD3 


PSMB8 


177046 


Protcasome subunit beta type 8; LM P7; large midtifunctional protease 7 


PTPRC 


151460 


Protein tyrosine phosphatase, receptor type, C; CD45; Ly5 homolog 


RARA 


180240 


Rctinoic add receptor, alpha 


R£N 


179820 


Renin 


RRAD 


179503 


RADl; RAD; ras-related associated with diabetes 


SAH 


145505 


SA homolog; SA 


SCNNIB 


600760 


Epithelial sodium channel, beta subunit; ENaCb; sodium channd, non-volrage gated 1, beta 


SGYA5 


1870U 


Small inducible cytokine A5 (RAhrrES) 


SDFl 


600835 


Stromal cctt-derivcd factor J; CXCIJ 2 


SELE 


131210 


E select in; ELAM I; endothelial adhesion molecule 1; CD 62E . 


•SELF 


173610 


. P-selectin; pSHL; CD62 antigen; CU62P; platelet alpha granule membrane protein 140kD; GRMH 


iSERPINAr 


107400 


Alpha-1 -antitrypsin; protease inhibitor 1; PI 


.SERPINA3 


.107280 


Alpha- 1 -antichymolrypsin; AACT 


SERP1NA8 


106150 . 


Angiotcnsinogen; AGT 


5ERP1NE1 


173360 


Plasminogen activator inhibiioir UPAIl * 


SFTPAl 


178630 


Pulmonary surfactant apoprotein; SPA; SP-A 


SHBG 


182205 


Sex hormone-binding globulin 


SiCllAl 


600266 


NRAMPl; natural resistance-associated macrophage protein I 


SLC2A1 


138140 


GLUTl; glucose transporter 1 


.SIC2A2 


138160 


GLU17; glucose transporter 2 


SLC6A3 


126455 


Dopamine transporter; DATl 


5U:6A4 


182138 


Serotonin transporter; 5HTI'; SERT 


.SNAP25 


600322 


SNAP-25; synapiosomal-associated protein, 25 kDa 


SNCA 


163890 


Synucldn, alpha 


SOD2 


147460 


Superoxide dismutase 2, mitochondrial; manganese superoxide dismutasc; Mn.SOU 


SRD5A2 


264600 


Steroid-5-alpha- reductase, alpha polypeptide 2 


T 


601397 


T Brachyury (mouse) homolog 


TAPl 


170260 


Antigen peptide transporter, ABCB2 


.TAP2 


170261 


Antigen peptide transporter 2; ATP-binding cassette, sub- family B, member 2; ABCB2 


TBXA2R 


188070 


Thromboxane A2 recqjtor 


TCFl 


142410 


HNFl -alpha; transcription factor 1 , hepatic 


TF 


190000 


Transferrin 


TFCP2 


189889 


Transcription factor CP2 


TGFA 


190170 


Transforming grovrth factor, alpha 


TGFBl 


190180 


Transforming growth factor, beta 


TGFB3 


190230 


Transforming growth factor beta 3 


TH 


191290 


Tyrosine hydroxylase 


THBD 


188040 


Thrombomodulin; THRM; CDl 41 


THRB 


190160 


Thyroid hormone receptor beta; ERBA2 


TNF 


191160 


Tumor neaosis factor alpha; TNFA; TNF superfamily, member 2 


TNFRSF6 


134637 


Fas antigen; CD95; tumor necrosis receptor superfemily. member 6; APT! 


TP53 


191170 


Tumor protein p53 


TPH 


191060 


Tryptophan hydroxylase 


XDXXT 


lO/OOU 


TTiinmirinp *!-methvItransferase 


TRA@ 


186880 


T-ceil receptor alpha locus 


TRD@ 


186810 


T-cell receptor delta locus 


TRHR 


. 188545 


Thyrotropin-reieasing hormone receptor 


UCHLl 


191342 


Ubiquitin carboxy-terminal hydrolase LI; ubiquitiri carboxy-terminal esterase LI 


UGB 


192020 


CC16; uteroglobin; Clara celi-speciHc 16-kDa protein; CGIO; CCSP 


UGTlAl 


191740 


UDP glycosyltransferase 1 family, polypeptide Al; UDP-glucuronosyltransfcrase phenol/bilirubin; UGTIA 


VDR 


601769 


Vitamin D receptor 


VLDLR 


192977 


Very low density lipoprotein receptor 


WFSl 


222300 


Wolfram syndrome ];wolfiramin : ■ 


• WRN . 


604611 


Werner syndrome; DNA helicase, recQ-like, type 3; RECQ3; RECQL2 


XRCC3 


600675 


X-ray repair cross-complementing protein 3 


YWHAH : 


113508 


14-3-3 eta; tyrosine 3- monooxygenase/tryptophan 5-monooxygenase activation protein, eta polypeptide . 



For each gene, the official geiie symln>l from the Human Genome Organisation. (HUGO, hnp://wmv.gene.udMC._uk/n0nt&tclatuTe), the number for the Online 
'Mendelian inheritance in man (OMIM, Baltimore: Johns Hopkins University, Center for MedicaJ Genetics, 1996, httpiMnn^yXncbinlmMikgoy/omim), and com- 
mon aliases and descriptions are given. 



GENERAL CONCLUSI(N(S However, most studies do not meet these 

^ V Ho^ does one tell whether reported associations betv^;een studies of an asisociation are usually inconsistent. Iii . these 

i/polymorphisms and disease are real? Reasonabie criteria for cases, meta-analysis of all published studies may guide inter- 

V ideclaring association have been proposed, including low P yal- .pretation, and we strongly advocate that any publication of an 

f^^es, replication in multiple samples, and avoidance of popuja- assodation study (whether negative or positive) acconipa- 

^-ition istratification (such as by using family-based controls^). nied by a metaranalysis of ail similar studies. Accordingly, in- 
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a. True positive association 
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b. False positive association 



Ethnic Group 1 

o o.© ^ 

Cases lf]©0 

00® 80% 



Controls B O 



80% 



Ethnic. Group 2 
□ □ 

o □ 

0 



20% 



20% 



F!g. 2 True associations contrasted with feisc-positrvr assiiciations due to ethnic ad- 
mixture. The open shapes represent individuals mth disease, and the filled sliapes repre- 
sent individuals from a control population. Shapes WitK a plus sign (+) represent indi- 
viduals carrying the putative risk allele being tested for association. In both figures, the 
fraction of individuals carrying the risk allele is twice as large in the case population as in 
the control population. Hgure 2a (Top): Tnje-positive association: the frequency of the 
risk allele is greater ill cases than in controls in both ethnic groups. Figure 2b (Bottom): 
False-positive association due to ethnic admixture: the frequency of the risk allele is 
identical in cases and controb in both populations. However, the allele is twice as frequent 
overall in caws as in controls. This false appearance of association is due to ethnic admix- 
ture, i.e., ethnic group 1 is ovcrrep resented in the cases, and the allele being testcil is 
prevalent in ethnic group I but not ethnic group 2. 



dividual researchers should also publish or make easily avail- 
able sufficient information to facilitate future meta-analysis, 
including relevant genotype and phenotype data. Publication 
bias may present a major challenge to such analyses, because 
the omissfon of small negative studies will bias the pooled data 
toward a positive result. In this regard, we advocate a mecha- 
nism for storage and dissemination of all association data 
(published.or not),.perhaps in a widely accepted and curated 
Web site and/or in brief "negative results" sections of specialty 
journals. UntU coniplete meta-analyses can be performed us- 
ing data from multiple large studies, we \vill be left with a sce- 
nario in which the majority of reported associations are in 
genetic purgatory, neitiier convincingly confirmed or refuted, 
awaiting future judgment. 

. Much of the interest surrounding genetic association studies 
centers on the potential clinical application of polymorphisms 



that serve as markers for disease. In particular, it has been pro- 
posed that these markers can both serve as predictors of disease 
and as a means to tailor treaUnent of disease. Although this 
scenario may well become reality, the current irreproducibilit)' 
of most studies should raise a loud cautionary alarm. Certainly, 
clinical applications of genetic associations should not be con- 
sidered until the degree of certainty' far exceeds the level cur- 
rently achieved for the vast majority of such associations. Fur- 
thermore, even if an association is supported by extremely 
convincing evidence, screening patients is only appropriate if 
determining an individual's genotype would allow a clinically 
proven beneficial intervention that outweighs the risk of per- 
forming.the test. Gienctic tests also give rise to ethical consid- 
erations, because of the implication for family members, the 
potential for discrimination, the immutability of genetic risk 
factors, and the predictive nature of such tests. (Althougli, 
given the probable modest effects of any particular genetic 
variant, most genetic tests are likely to be much less predictive 
of future health than widely used screens such as blood pres- 
sure and cholesterol measurements.) Societal consensus. and 
legislative solutions addressing these ethical concerns are 
heeded before such testing enters widespread clinical practice. 

Because of the scientific and ethical unoertauities. a "DNA 
chip" that can determine crucial genotypes and accurately predict 
future health is unlikely to become a widespread and useful 
screening tool in the near future, even if concerns regarding re- 
producibility can be resolved. Rather, the most likely short-term 
benefit from genetic association studies will be a better under- 
standing of disease pathogenesis, which will hopefully lead in turn 
to novel and better treatments and/or more tailored drug therapy. 
If genetic association studies can provide these sorts of advances, 
they will have proven a valuable resource in the struggle to under- 
stand and treat common disease. 
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